Thursday, October 29, 2009

Extract variables from data frame by name

I had to combine a bunch of .csv files into a data frame. As the various files had no column names a side effect of the import were duplicate column names. A similar result could arise when using "cbind". As I was only interested in the eighth variable (called "V8" by default) I needed a way to extract all (and only) the V8's into a new data frame. Unfortunately the "V8"s were not at regular intervals. 

names(april2008) yields:

 [1] "station" "V7"      "V8"      "V4"      "V5"      "V6"      "V7"    
 [8] "V8"      "V5"      "V6"      "V7"      "V8"      "V6"      "V7"    
[15] "V8

I wanted to extract columns 3, 8, 12, 15.

This was achieved by using the "which" command. The initial data frame was called "april2008".

I used:

april2008[,which(names(april2008)=="V8"] -> newapril08

names(newapril08) yields:

[1] "V8"    "V8.1"  "V8.2"  "V8.3"

Monday, October 12, 2009

sorting a dataframe

To order selected variables in a dataframe you need to use the "order" and not the "sort" function. The easiest way to do this is to use the numerical (square bracket) notation system.

Specify the dataframe (lets call it "stations") and in the place where you would usually specify which rows to use place the command....

order(variable to order by).

So ..

station[order(station$V4),2:4]

will produce a data frame composed of the 2nd 3rd and 4th variables of the station dataset. However it will now be ordered by the variable called "V4".