Sunday, June 20, 2010

Weighted mean: tapply

To obtain weighted means for categories of a dataframe use has to be made of
a) the tapply function and
b) the weighted.mean function.

The information required for the tapply component is:


"lll" -  a list of the rows to which the function must be applied
"rrr" - the categories over which the function must be aggregated
"function(i, x, w,)...." - the function to apply. In this instance the 'weighted mean'.

The info required for the weighted.mean component are 'x' (the values to be weighted) and 'w' (the weights).

The general form is:

tapply(seq(along=lll), rrr, function(i, x, w) weighted.mean(x[i], w[i]),x=lll, w=ttt)

An example of weighting incomes ("incval") by frequency counts ("worker_wgt") for each "tbvc" category is:


tapply(seq(along=incval),tbvc, function(i, x, w) {weighted.mean(incval[i], worker_wgt[i],na.rm=T)})

Obviously this is done after attaching the relevant dataframe.