Friday, January 30, 2015

Diversity and Inequality

Last weekend, a post on Reddit's linguistics subforum showing a map of worldwide language diversity was a big hit. This map used a metric called Greenberg's Linguistic Diversity Index, which is the percent chance that two random inhabitants of a given country have two different mother tongues. States like largely-homogeneous South Korea and Haiti have low scores (0.003 and 0.000, respectively), while places like Tanzania and Papua New Guinea, where every village might speak a different language, have LDIs of 0.95 or higher.

Source: Reddit User Whiplashoo21

In the ensuing discussion, one user was interested in seeing how linguistic diversity compared with development. As you can see on the map above, many of the most linguistically diverse countries are in impoverished sub-Saharan Africa. In fact, this is a popular topic in political science and economics, studying whether cultural diversity makes a country better off, or whether it leaves a state susceptible to Balkanization and ethnic conflict.

To test this out, I started by looking at exactly what the commenter was asking about; LDI against inequality-adjusted HDI. For those who don't know, the Human Development Index is an attempt at a more holistic measure of development, which looks at three basic indicators (life expectancy, educational attainment, and per capita GDP) to come up with a single number. Since 2010, the UNDP has also published a second index, adjusted for inequality. Most states provide the UN with enough data to compute both indices, although there are a number of notable exceptions.

Source: Wikipedia image, Data from UNDP.

For my data, I used UNESCO's 2009 report on linguistic diversity for the LDI, and the UNDP's 2014 figures for HDI. This data is slightly different than the reddit post's source, but there aren't very substantial variations between the two LDIs.

IHDI = -0.308LDI + 0.691, R² = 0.246***, p < 0.00001

Unfortunately, diversity does not appear to be a positive at first glance. As the graph shows, there's a strong, but small, negative correlation. This model estimates that linguistic diversity accounts for about 25% of the variation in HDI scores. While this isn't a very large impact, it is a very interesting effect to see. Of course, it's a cardinal error in statistics to equate correlation with causation, and in this case there are two things to look out for: First, it's very likely that linguistic diversity isn't endogenous; it doesn't happen by itself. Second, there's very likely some third variable acting on both a country's diversity and development. To showcase this better, I grouped countries by continent.

Notice how much more diverse and poorer Africa is than the rest of the world. Both of these are a product of colonialism; the former a product of the Scramble for Africa which prioritized natural resources or landmarks over pre-existing ethnic groups in forming colonial borders.

Looking at greater cultural diversity comes up with similar results. I used a measure from a paper by Erkan Gören at the University of Oldenburg. Gören came up with a new index that takes into account religious, ethnic, and linguistic differences, and then adjusts them for how similar the languages actually are. This cultural diversity map (made for a blog post by Pew research) looks mostly similar to the original linguistic diversity map.

And similarly, looking at his figures come up with similar trendlines.

IHDI = -0.404GI + 0.697, R² = 0.282***, P < 0.00001

Going back to the scramble for Africa, I decided to try something new and adjusted HDI scores by continent. Since there's a lot of similar history for many countries on the same continent (Much of the Americas are monolingual ex-colonies with a very small indigenous population remaining, Africa has boundaries not drawn to ethnic lines, Asia has more-or-less well-drawn ethnic lines), maybe much of this relationship is just a product of colonial history.

IHDI_z = -0.529LDI + 0.242, R² = 0.028, p = 0.0508

And sure enough, the relationship breaks down.

One more thing I noticed is that more linguistically-diverse countries are more unequal.

% Loss = 15.614 LDI + 14.063, R² = 0.200***, p < 0.00001

Then again, that's just because poor countries tend to be more unequal anyway.

% Loss = -58.027HDI + 60.576, R² = 0.7607***, P < 0.00001

Until next time.