Saturday, July 08, 2006

Using R for statistics

People thinking about working on data analysis might consider using R, a free software environment for statistical computing and graphics that runs on a wide variety of platforms, including Linux, Windows and MacOS. You can get the software here: http://cran.r-project.org/. An online html verison of the manual is available here: http://cran.r-project.org/doc/manuals/R-intro.html

The documentation is written by and for geeks, and can take a little time to understand, but if you are willing, there are several advantages. It is free, powerful, and runs on multiple OS platforms, so it is something you can continue to use, and it makes it easier to collaborate with others.

Amazon has a number of good printed manuals. I liked Peter Dalgaard's Introductory Statistics with R. The free PDF Manual from the R-Project is also good.


Some basics to get started

Using excel, open office, a text editor or another program, create a dataset that has as it's top row, the names of variables (no spaces or illegal characters), and presents data in rows and columns. For example:

#----filename somedata.cvs
country,delta_r,all301,pop04,gni04
BOL,8,9,9009045,8640026000
BRA,15,11,183912544,5.52E+11
CAN,-4,12,31974364,9.05E+11
CHE,-1,0,7389581,3.66E+11
CHL,14,12,16123815,84159790000
CHN,20,12,1296157440,1.94E+12
CIV,-9,0,17871896,13581220000
UKR,-18,9,47451292,60200640000
URY,0,7,3439473,13423750000
USA,-8,0,293655392,1.22E+13
VEN,-7,12,26127000,1.05E+11

To read this data into R, you might use a command like this:

somedata <- read.table("somedata.csv",header=T,sep=",",row.names=NULL,na.strings="zz")

What this does is read the data from the text file, and create a dataset called somedata. The command gives the name of the file, tell R the first row is for hearders, the columns are separated by comas, there are no row names, and missing data are represented by "zz."

You could create variables like this:

delta_r <- somedata[,2]
all301 <- somedata[,3]
pop04 <-somedata[,4]
gni04 <- somedata[,5]

and plot the GNI data like this:

hist(gni04) , or plot(all301, pop04 )

To run an OLS regression, you could do this:

test1 <- lm(all301 ~ pop04 + gni04)
summary(test1)
You could edit a dataset, and put the results into a new variable, like this:

somedata2 <- edit(somedata)

1 Comments:

At 10:02 AM, Anonymous said...

Welcome to the Pharmamx.com Family! We invite you to visit us at www.pharmamx.com and find our great medicine prices. We provide serious and first class service to all our customers 24/7. If we do not carry a medicine you need just let us know and we will be more than glad to assist you! To show you our gratitude for past purchases and to offer you one more reason to continue purchasing with Pharmamx.com we are offering a limited time 30% discount included on all our medicines. We will keep on giving you the best price and service in the market. Welcome and enjoy your visit to Pharmamx.com

 

Post a Comment

<< Home