Using R for statistics

People thinking about working on data analysis might consider using R, a free software environment for statistical computing and graphics that runs on a wide variety of platforms, including Linux, Windows and MacOS. You can get the software here: An online html verison of the manual is available here:

The documentation is written by and for geeks, and can take a little time to understand, but if you are willing, there are several advantages. It is free, powerful, and runs on multiple OS platforms, so it is something you can continue to use, and it makes it easier to collaborate with others.

Amazon has a number of good printed manuals. I liked Peter Dalgaard's Introductory Statistics with R. The free PDF Manual from the R-Project is also good.

Some basics to get started

Using excel, open office, a text editor or another program, create a dataset that has as it's top row, the names of variables (no spaces or illegal characters), and presents data in rows and columns. For example:

#----filename somedata.cvs

To read this data into R, you might use a command like this:

somedata <- read.table("somedata.csv",header=T,sep=",",row.names=NULL,na.strings="zz")

What this does is read the data from the text file, and create a dataset called somedata. The command gives the name of the file, tell R the first row is for hearders, the columns are separated by comas, there are no row names, and missing data are represented by "zz."

You could create variables like this:

delta_r <- somedata[,2]
all301 <- somedata[,3]
pop04 <-somedata[,4]
gni04 <- somedata[,5]

and plot the GNI data like this:

hist(gni04) , or plot(all301, pop04 )

To run an OLS regression, you could do this:

test1 <- lm(all301 ~ pop04 + gni04)
You could edit a dataset, and put the results into a new variable, like this:

somedata2 <- edit(somedata)


