Monday, September 28, 2009

Import multiple files into R

I needed to import 144 files in comma separated variable (.csv) format into R. Each individual imported file was to be a dataframe. Obviously, given the nuber of files, automating the process was highly desirable. It took quite a bit of searching before I found a simple solution. The biggest "problem" was finding a way of generating the names of the new R objects. The solution:

Create an R object (called "filelist") giving the names of all the .csv files in the relevant directory.

list.files("/name/the/directory/",pattern='csv$', full.names=T) -> filelist

The "pattern" command lists all files ending in ".csv".

Now its possible to import each of these files using a simple "for" loop. However the point is to reference each cvs file not by its name but by its location in the above "filelist" object. This variable will also be used to generate the name of each of the 144 new dataframes. These will be called named between "thisit1" and "thisit144".


for (var in 1:144) assign(paste("thisit",var,sep=""),read.csv(filelist[var]))

Wednesday, September 16, 2009

The relationship between income levels, income inequality and rates of crime in the RSA

One of the biggest difficulties in attempting to understand or simply describe crime and crime trends is the diversity of offences under review. Kidnapping, shoplifting , child abuse are all criminal acts but each offence has its own peculiar pathology. The profile of offenders will differ in terms of age, background, social status, psychological makeup etc. Understanding crime and developing prevention strategies as a general category is a bit like attempting to develop strategies to combat disease. Rather than attempting to address all diseases at once it is better to develop strategies centred on particular disease types. Similarly is makes sense to develop one strategy to combat/understand murder(or particular types of murder), another to deal with rape, yet another to deal with commercial crimes and so on.There is however a significant downside in attempting to understand crime by tackling each offence type individually. If lucky, the process will result in a plethora of understandings leaving unanswered the question as to what underlying factors which determine general crime trends are. Does the economy matter, are particular age cohorts prone to committing offences etc.? Thus in order to understand the dynamics driving crime in general here we distinguish only between violent and non-violent crimes. "Violent crimes" include murder rape, attempted murder as well as the various kinds of assault and robbery."Non-violent crimes" cover all other offences including culpable homicide, drug offences and even drunk driving. Numerically "non-violent" crimes are dominated by various forms of theft. In order to identify the social, economic and demographic drivers behind crime we compare crime rates in police precincts to the characteristics of the people living in that area. To do this we first estimate the number of crimes and the population for each of the over 1 000police stations in the country. These figures can then be used to estimate the crime rate in every precinct. Obviously the derived rates differ markedly be-tween areas. The question then arises as to how this enables the driving forces behind crime to be identified. Mathematically the strategy is then to identify those social/demographic characteristics that most systematically account for the differences in the crime rates of the areas. Using fairly simple statistics it is even possible to measure the strength of the relationship between crime and the social/demographic characteristics. When a consistent relationship between the crime rate and the social/demographic characteristics of the area is found there is then reason to believe that the characteristics in question holds a clue as to what is driving crime. For example if high crime rates are observed in those areas where male youth make up a larger proportion of the population there is reason to suspect that male youth are somewhat more inclined to commit offences. However the mere presence of a relationship does not prove that the social condition causes crime. The nature of the relationship has to be distilled from theories or other evidence.Often the resulting "insights" are misleading. For example, in the above scenario it is relatively easy to convincingly argue that male youth are more likely to commit offences than other demographic groups. However those areas with a high proportion of male youth are also very likely to have a high of proportion female youths (i.we they have a young population). There will consequently also be a relationship between the crime rate and the proportion of female youth in the population. However despite having a similar correlations to crime rates to that of male youth it is more difficult to convincingly propose that female youth are also more likely than other demographic cohorts to commit crimes.

Here a distinction is drawn (only) between violent and non-violent crimes,however the two categories are, as will be seen, not unrelated. Indeed one of the most important pieces of information needed to understand rates of violent crime is the rate of non-violent crime (and vice versa). But first things first.

The level of vulnerability any one person has to being a victim of crime is expressed by the "crime rate". This, in practice, is defined by dividing the number of crimes reported to a police station by the number of people living in that police station's jurisdiction. A higher crime rate implies that people in that area are more likely to have been (and, by inference, will be) a victim of crime. The question then becomes: what are the defining social and demographic characteristics of those areas with higher crime rates. In particular the initial question posed below is "what are the characteristics of areas in which people are more likely to be victims of violent crime?".It is widely believed that crime levels are aggravated by high levels of poverty and inequality. By inference the national development agenda of increasing wealth and reducing inequality should contribute to lower crime rates and safer communities. The usual reasoning is that the deprivations experienced by the poor "force" them to commit offences in order to survive. The evidence regarding the impact of inequality is less ambivalent but arguments run along the lines that, when the great affluence of a wealthy few is juxtaposed against their own poverty the poor are less inclined to respect others property and other rights.This "relative deprivation" invites the envy or the contempt by which the poor allow themselves to commit crimes. The popular perspective of the impact of poverty and inequality on crime rates also carries with it the dubious notion that, albeit through the force of circumstances, the wealthy are somewhat less inclined to criminal or anti-social behaviour.

The role of poverty and inequality in exacerbating crime pervades much of the literature on criminology. This is particularly true for writings from developed economies. In many criminology texts a somewhat different rationale is given for the relationship between poverty, for example, and crime.

The argument typically runs along the lines that poor households are more likely to be headed by single parents (or, in the South African case, by children or the aged). This often results in harsh economic conditions because of in-ter alia, the dependence on only one parents income, greater reliance on social grants etc. The hardships coupled to the other typical constraints (like parent's lengthy absence from home, need to work shifts etc.) is likely to result in children growing up with relatively low levels of supervision and inadequate socialisation mechanisms. These factors translate into high drop-out rates from school, high incidence of teenage pregnancy and increased membership of gangs.This behaviour is, in turn, strongly associated with low employment levels, high levels of conflict with the law, greater levels of the abuse of drugs and alcohol,as well as increased incidence of prostitution. Gang membership, alcohol and drug abuse will contribute further to criminal behaviour as poorly educated, unemployable youth seek to service their habits or obligations. The consequence of the poor socialisation of youth contribute to the youth themselves creating dysfunctional households - thereby ensuring that the cycle of poverty and deprivation and criminality is perpetuated. In short, poverty is seen to result in poor socialisation levels of children who develop into youth whose behaviour is somewhat more inclined to socially questionable or criminal behaviour. Itis also largely due to the described mechanisms that the youth have become associated with crime in this literature.

1 Determinants of the rate of violent crime What this does mean is that as areas become wealthier residents exposure to crime (and thus victimisation rates) increase. An examination of the available data shows that this is indeed the case - at least in so far as low to middle income areas are concerned. The data indicates that rates of violent crime in areas increase with income until middle income levels are reached. "Middle"income levels correspond to the average observed in the townships adjacent to metropolitan and other cities. After this income level is reached any additional increases in average income have no discernible impact on the rates of violent crime. It is almost as if once a fairly modest income threshold has been reached further increases in households income prevent additional (but does not reduce) exposure to violent crime.However what seems to be happening is an increase in rates of violent crime correspond less to income than to higher levels of urbanisation. Entirely rural areas (who are the poorest in terms of income) have the lowest rates of violent crime. As the urban proportion of the precinct population rises so too does the level of violent crime. Once the precinct population is fully urbanised (as in townships in cities) the rate of violent crime rate ceases to rise even if income does increase. The rate of violent crime is thus primarily dependent not on income but on the degree of urbanisation. However because urbanisation is strongly correlated to income there does appear to be a correlation between income and rates of violent crime.

The realisation that the process is driven by urbanisation levels rather than income has important implications for understanding trends. For one, virtually all precincts where the largest population group is white, coloured or Indian the population is fully urbanised. In these urban precincts there is no systemic correlation between income and rates of violent crime. For minorities the stated relationship between income and crime rates does not hold. However given that one population group is numerically dominant there is nevertheless a distinct correlation between income and crime rates. The correlation between rising urbanisation levels, income and rates of violent crime thus only hold for those areas where Africans are the dominant population group.

In fact for "African areas" most (55 percent) of the variation in the rates of violent crime can be explained by two factors alone - the areas average household income and the level of urbanisation in that precinct. The remaining 45 percent has to be explained by other factors like local conditions, regional culture, police services and so on.By contrast, there is virtually no correlation between average income levels and rates of violent crime in urban precincts. In fact in the urban areas little of the variation in rates of violent crime is explained by social or demographic factors. Factors considered when seeking to understand crime rates included:levels of income inequality, the employment rate, the proportion of the population made up of male youths, the proportion of households where the father is absent and so on. None of these factors offer a reasonable explanation as to why rates of violent crime differ. Such a finding was indeed disappointing as it indicated that there was no systemic relationship between crime and the expected social conditions.Ultimately it is difficult to explain why rates of violent crime vary between areas. The most rigorous application of census data is able to explain less than half the level of variation in rates if violent crime. The social/ demographic variables that offer the greatest explanatory value for rates of violent crime in urban and non-urban areas are:

* population size (-)

* household income (+)

* income inequality within area (-)

* dominant race group OH area (- for white and Indian)

* level of urbanisation (+)

The general thrust of trends is as follows:

* As the population of population of the precinct increases the rate of violent crime declines.

* as household income increases the rate of violent crime increases.

* as the level of income inequality within area increases the rate of violent crime decreases.

* if the predominant racial group in an area is white or Indian the rate of violent crime is lower.

* as the level of urbanisation increases so too does the violent crime rate. These five variables explain slightly less than half the difference in the rates of violent crime observed across the thousand plus police stations. This means

that most of the difference has to be explained by other variables other than those included in census data. Obviously much of the underlying factors are still to be identified (if they do in fact exist). However it did transpire that there was one non ascribed characteristic census factor that was highly correlated to rates of violent crime and which does explain much of the variation in rates of violent crime - even for the urban precincts. That informative factor is the rate of non-violent crime. In general the higher the rate of non-violent crime the higher the rate of violent crime. Seemingly if you are in an area where you are likely to be a victim of a violent crime you are also more likely to be the victim of a non-violent offence as well (and vice versa). Part of the correlation is due to a proportion of non-violent crimes turning violent. As the number of non-violent crimes in an area increases so too does the number of such crimes turning bad (violent).When the rate of non-violent crime is added to the above list of five items then almost three quarters of the variation in rates of violent crime can be explained. Only a quarter of the total variation then has to be attributed to local conditions, police performance and any other variable. This highlights two general things: firstly the random nature of violent crime. As only a relatively small proportion of crime can be attributed to ascribed factors (like race, gender and income) a central role is being played by non-determinant factors like pure luck and lifestyle.However , this said, the single biggest determinant of a high rate of violent crime is - a high rate of non-violent crime. This suggests that strategies to limit murder, assaults and robberies would benefit significantly by strategies that reduce the rates on theft, drug offences etc.

2 Determinants of the non-violent crime rate The social and demographic correlates of non-violent crime are far more informative these rates than they were about the rate of violent crime. This indicates that randomness (pure luck) plays less of a role in indicating who is likely to bea victim. In fact almost seventy percent of the variation in rates of non-violent crime can be explained by five ascribed social/demographic factors:

* population

* income

* precinct size (area)

* income inequality within the area

* income inequality between areas

* level of urbanisation

Briefly the relationships can be summarised by:

* The greater the population of an area the lower the crime rate.

* The bigger an area if in terms of region covered the lower the crime rate.

* The higher the income of an area the higher the rate of non-violent crime.

* The more unequal the income of an area is the higher the rate of non-violent crime.

* The larger the gap between the income of an area and that of its poorest neighbour the higher the non-violent crime rate.

* The higher the level of urbanisation the greater the rate of non-violent crime. In the initial analysis it transpired that a high rate of non-violent crime is associated with wealth (not poverty) and inequality (both within the area in question and between that area and the adjacent precincts). However the impact of income inequality of observed crime rates is surprisingly small. Given the political emphasis on redistribution the relationship between inequality and crime bears further exploration.

What is most pertinent about income inequality at least in so far as crime is concerned is the direct relationship between income and inequality. When we

compare income levels with the best known measure of income inequality (the gini coefficient) it becomes clear that income inequality increases with income. In other words the wealthiest areas tend to have the most unequal distributionof wealth. Conversely the income of the poorest areas tends to be relatively evenly distributed within the population.
When we examine income levels and inequality at the same time it becomes clear that the relationship between these two factors and crime is driven almost entirely by income levels and only partly by how that income is distributed. In fact when the relationship between rates of non-violent crime and income and inequality are considered it becomes clear that high levels of inequality are associated with lower rates of non-violent crime.

This finding indicates that neither poverty nor inequality (within an area) exacerbate the level of non-violent crime. There is seemingly no evidence to support the popular understanding that poverty causes crime.One shortfall of the method used above is that it treats each police precinct as being self-contained. This implies that both victims and perpetrators come from within that precinct. While the first assumption may be a reasonable the latter is questionable. Surely in the analysis allowances should be made for criminals to leave their precinct to commit offences elsewhere. Within the limits of the available data such an allowance is indeed catered for. This is achieved by including in the analysis the income of neighbouring precincts. The inclusion of this factor allows us to see whether or not differences in income between areas impacts on crime rates.

The analysis does indeed show that the bigger the difference in income be-tween an area and its neighbour the higher the crime rate (of non-violent crimes that is) in the wealthier area. This indicates that much property crime may well be committed by perpetrators leaving their own precinct and going to commit offences in nearby precincts where the pickings are better and they are less likely to be identified.

Six factors : population size, average income, precinct size (area), income inequality within the area, income inequality between areas and the level of urbanisation explain almost 70 percent of the variation in rates of non-violent crime. As indicated above there seems to be a strong relationship between rates of violent and non-violent crime. If we add the rate of violent crime to the above six factors we are able to explain 83 percent of the variation in rates of nonviolent crime. Unfortunately, as indicated above, we are unable to statistically determine if non-violent crime causes violent crime or whether the situation is reversed. To infer causality we have to, as we did above revert to other evidence or theory.

When we view the above six factors and the rate of violent crime simultaneously we discover that, once again, the single biggest determinant of the rate of non-violent crime is the level of violent crime.

The correlates of crime The results of the analysis of crime trends is somewhat surprising. For example the widely expected correlations between crimes rates and income inequality and poverty are not borne out. In fact poverty is the single strongest indicator of lower rates of violent and non-violent crime. Also, once economic factors like wealth and inequality are considered the role played by the racial characteristics of areas is surprisingly small. One quarter of the variability in rates of nonviolent crime is explained by the racial classification of areas. Similarly one-eighth of the rate of violent crime can be explained by the same data. However if, as indicated above, we look to areas' average income as well as the level of income inequality no further insight is gained by looking to the racial classification of areas.

This suggests that, despite South Africa's history of racial discrimination,the more informative indicator of crime levels are economic factors not racial ones. This indicates that the crime levels in areas defined by racial groups are inline with what is expected of any area with that level of income and inequality irrespective of who lives there. We thus do not need to look to cultural or other ethnically bound factors to explain crime rates. Obviously there is still a strong correlation between income levels and the racial characteristics of those areas.This is in stark contrast to much of the criminology literature.

However this is not to say that the racial heritage of areas is not significant.Its effects are manifest in the way in which affluent and poor neighbourhoods are often situated adjacent to each other. The size of the income difference between such area plays an important role in raising crime levels in the more affluent area.

Interpreting the findings Do the counter-intuitive findings mean that the South African experience is some how unique and that the criminology texts needs to be rewritten? The literature from developed economies shows that the impact of poverty on crime is via the reduction in social capital. In those contexts poverty undermines the social bonds and family, kinship and community connections that ensure that behaviour patterns (particularly of the young) are bound by the appropriate norms and rules.In South Africa the impact of social capital works in the same way - what is different is the way in which these bonds and obligations are weakened. In developed economies family breakdown is highest among poor households. This fragmentation gives rise to (as described above) poor education levels, low employment levels, early pregnancy and ultimately high crime levels. By contrast, in developing economies the poorest households tend to be located in rural areas which are frequently marked by strong familial and kinship bonds. These bonds prevent the weakening of the "social capital". In other words the stronger social ties seem - despite their poverty - to prevent the development of anti-social behaviour among children and youth. Consequently the poorest areas in South Africa tend to have the low rates of both violent and non-violent crime.

It would seem that social capital is strongest in poor (rural) areas and weak-est in highly urbanised (and thus more affluent) areas and thus the constraints on anti-social conduct are weakest in areas that are more affluent because they are more urbanised. The effect of this is increasing rates of violent and nonviolent crime as urbanisation levels increase. However approximately half the South African population lives in urban areas and a similar proportion of police precincts are fully urbanised.In these fully urbanised precincts the rates of non-violent crime continues to increase as their income rises. Obviously this cannot be explained in terms of urbanisation levels. It is also not suggested that social capital continues to decline and, consequently, that the sons and daughters of the upper middle class are likely more likely than their township counterparts to be running around in gangs, pregnant and high. How then are the higher rates of non-violent crime rates in these areas to be explained? The answer still lies in declining "social capital". For the social and community bonds that constrain anti-social or criminal behaviour to be effective they need to constrain the behaviour of all members of that community. If some people in that community (however transient) are not bound by those norms they will be prone to committing anti-social offences. It is in this regard that the more affluent areas are particularly vulnerable - included among them are members of nearby or adjacent areas who do not normally "belong" to those communities. As these outsiders are not constrained by that community's norms they are likely to violate them. The data further indicates that the likelihood that any community includes such alienated individuals is proportional to the size of the gap in incomes of that community and its neighbours. The larger the income gap the higher the crime rate in the more affluent area.This may simply be interpreted as perpetrators from one area moving to another where the pickings are richer. However once the idea that poverty drives crime is dispelled the question arises as to why these perpetrators allow themselves to commit the offence in a neighbouring precinct. Another way of presenting the key problem is "why does inequality between areas contribute to higher crime when inequality within areas does not?". If inequality per se contributed to higher crime levels then only the size of the income gap should matter not whether or not the comparison was between areas or within areas.The short answer to this key question is that the people in the adjacent area are not seen as being members of the perpetrators community and they are not beneficiaries of the norms and values that apply at home. The driving force behind rising crime rates in affluent areas is the social capital which better regulates the behaviour of perpetrators at home than it does in the next suburb. This is largely the product of the compartmentalised urban topology which is viewed as typical of apartheid planning. When these compartments coincide with racial, material and other social fracture lines a recipe exists for the accel-erated collapse of the constraints offered by social capital. If juxtaposed areas did not emphasise differences between communities there would be no effect of crime rates above that attributable to inequality within areas (and as indicated above increased inequality within areas is correlated to lower crime rates).The situation is aggravated by the lifestyles of affluent suburbs in which people live not as communities but as aggregations of isolated households. The isolation is both social and physical. Physical barriers wile walls and electric fences offer the illusion of security while insulating the households from the outside world as well as from neighbours and other benign aspects of the community.

Sunday, September 13, 2009

R names in dataframe matching a string

To find out which variables in a R dataframe (called "part2") contain
the words "cat" do the following:
names(part2[,grep("cat",names(part2))])
This produces a list the names of the variables.
[1] "cat" "cat_1" "cat2"
However I would usually rather know the variable (column) number
corresponding to "cat". To do this:
grep("cat",names(part2))
This gives:
[1] 1 41 235
Obviously then
names(part2)[c(1,41,235)]
also gives:
[1] "cat" "cat_1" "cat2"

--

Michael O'Donovan
skype: maodonovan
tel: 0110218108
fax: 0865173354