Understanding heterogeneous treatment effect estimation using proof-by-Stata

Marc Bellemare asks whether splitting your sample by an observed covariate is a reasonable approach for estimating heterogeneity in treatment effects:

To get a treatment heterogeneity, wouldn’t it be better to maintain your sample as is, but to interact your treatment (i.e., land title, college degree, etc.) with groups (i.e., small and large plots, race, etc.), going so far as to omitting the constant in order to be able to retain each group

In general, selection on observables will not cause bias in OLS estimates. So this approach is okay. You can prove this formally by showing that your treatment variable of interest is uncorrelated with the error term in the selected sample – see page 7 these slides for a sketch of that proof. However, I don’t find that proof to be very useful for generating the intuition about why this is the case, so here is a brief proof-by-stata:

clear all
set seed 12345

*set up matrix of correlations between variables

matrix C = (1, .75, 0 \ .75, 1, 0 \ 0, 0, 1)

*simulate the data generating process – correlations between RHS variables
drawnorm T z u, n(1000) corr(C)

*generate y using our RHS variables
*T is the variable of interest
*z is an observed variable that changes how T affects y
gen y=1+2*T+0.3*z+u if z>0

reg y T z
reg y T z if z>0
reg y T z if z<0

So we get unbiased estimates of the average treatment effect and of the conditional treatment effects given z>0 and z.

You can also use this approach to see that for your point estimates, it doesn’t matter if you estimate the heterogeneous treatment effects by using a dummy variable interacted with the treatment instead. That is, it doesn’t matter provided you do a fully-saturated model – you have to interact the dummy with all your RHS variables, not just the treatment:

gen z_above_0 = z>0
reg y i.z_above_0##c.T i.z_above_0##c.z

*for comparison purposes, make T*below & T*above
gen T_z_above_0 = T*z_above_0
gen T_z_below_0 = T*(1-z_above_0)

reg y T_z_above_0 T_z_below_0 z_above_0 i.z_above_0##c.z

If you run the code yourself and mess with the seed value for the RNG, you can confirm that this method mechanically generates identical point estimates to the split-sample approach. However, the saturated approach assumes a common error term distribution across the whole sample, so this approach will not give you the same standard errors. Again, if you run it you can see they are the same.

One of the commenters on Marc’s blog pointed out that a case where this is definitely problematic is if we select our sample on a dependent variable. Suppose we have heterogeneity by unobserved characteristics u, and we try to get at this by splitting the sample using values of the outcome:

*now look at heterogeneity by unobserved variable, u

gen y2=1+2*T+0.3*z+u if u>0

reg y2 T z
sum y2, d

*try splitting the sample by y
local y2_high = r(p75)
reg y2 T z if y2>`y2_high’
reg y2 T z if y2<`y2_high’

The two separate regressions now each generate biased estimates of the mean treatment effect, and the CIs also don’t include the heterogeneous treatment effects by u. In other words, catastrophe. This is also something we can prove in general (page 15 of the slides linked above) – T is not independent of u. This just reinforces the maxim that selection on X is okay, whereas selection on y is a big problem.

Posted in Uncategorized | Leave a comment

“Not-so-reluctant entrepreneurs” and other encouraging facts about economic growth in Africa

One of the best parts of Banerjee and Duflo’s Poor Economics is a chapter on “reluctant entrepreneurs” – people who own and manage their own businesses not because they want to, but because it’s the only source of income available to them. It’s a concept that ties together a lot of what we know about poverty alleviation and small enterprises in the developing world. Part of why microcredit and business training don’t achieve huge gains in lifting people out of poverty because the business-owners targeted by those policies aren’t passionate about growing their businesses. They’re just trying to get by.

This view of small businesses in Africa is challenged somewhat by a new NBER working paper by Diao and McMillan (“Toward an Understanding of Economic Growth in Africa: A Re-Interpretation of the Lewis Model”, ungated IFPRI working paper here). They cite a survey of small businesses in Tanzania that found that 54% of small business owners would not prefer a salaried job. That 54% is the exact opposite of a reluctant entrepreneur – they enjoy running their own business. A surprising 60% of all these businesses are growing, too, which is consistent with people enjoying running them.

The paper has many other interesting and encouraging datapoints – Angola, famous for being the leader of Africa’s growth boom, has faster growth in its agriculture sector than its mining sector.* Moreover, manufacturing is growing as a share of all exports. Both factors argue against the standard narrative that this is another export boom that won’t lead to broad-based economic development. The authors also argue that the current economic growth is led by domestic demand and accompanied by growth in the “in-between” sector – their word for the set of small businesses that includes “reluctant entrepreneurs”.

I was a lot less interested by their GE model of Rwanda’s economy – but the assemblage of data on Africa’s economic growth makes the paper worth reading in its own right. I have argued before that it is time for optimism about Africa, and now I feel even more confident about that.

*Bearing in mind the often-dubious quality of African GDP statistics.

Posted in Uncategorized | Tagged , , , | Leave a comment

My new job

People who know me personally already know this, but I figured I would put it up here in order to make it public: starting this fall I will be an assistant professor in the Department of Applied Economics at the University of Minnesota. It’s hard to overstate how excited I am about the job – the department has a fantastic group of people doing applied microeconomics. It’s also in an amazing city (two, technically, although Minneapolis and St. Paul nearly merge together at their boundary). I am fully on the Matt Yglesias “everyone should move to Minneapolis” bandwagon.

Now that I am gainfully employed, my long job market-induced blogging hiatus is finally over. I have a number of pent-up ideas I’ve been meaning to write about, so I expect to post pretty regularly in the near future (subject to the finishing-my-dissertation constraint).

PS: Speaking of blogging, I’ve been reading my new colleague Marc Bellemare’s blog religiously for many years now and I highly recommend it. If your friends are anything like mine, his skeptical economist’s take on food & food policy is a breath of fresh air. He’s been on an absolute tear lately – here he is calling out the NYT for ignoring low meat consumption as an important contributor to people being underweight in India, for example.

Posted in Uncategorized | Leave a comment

“People think it’s easy to contract HIV. That’s a good thing, right? Maybe not.”

That’s the title of my guest post on the World Bank’s Development Impact blog, describing my job market paper. Here’s a bit of the introduction:

People are afraid of HIV. Moreover, people around the world are convinced that the virus is easier to get than it actually is. The median person thinks that if you have unprotected sex with an HIV-positive person a single time, you will get HIV for sure. The truth is that it’s not nearly that easy to get HIV – the medical literature estimates that the transmission rate is actually about 0.1% per sex act, or 10% per year.

One way of interpreting these big overestimates of risks is that HIV education is working. […] The classic risk compensation model says this should be causing reductions in unprotected sex.

Unfortunately, the risk compensation story doesn’t seem to be reflected in actual behavior – at least not in sub-Saharan Africa, where the HIV epidemic is at its worst. […] If people are so scared, why don’t they seem to be compensating away from the risk of HIV infection? I tackle that question in my job market paper, “The Effect of HIV Risk Beliefs on Risky Sexual Behavior: Scared Straight or Scared to Death?” My answer is surprising.

You can read the whole thing on their site by following this link. My post is part of their annual Blog Your Job Market Paper series, which features summaries of research from development economics Ph.D. students on the job market. People who follow this blog should check out that series, which has featured some really awesome research this year. More generally, Development Impact is by popular acclamation the best development-focused blog out there; I read every post.

Posted in Uncategorized | 1 Comment

Devastating fact of the day

Liberia had only 50 physicians for the whole country before the [Ebola] epidemic

I learned this awful fact (which perhaps I would have known already, had I been following the response to Ebola more closely) from Mead Over. He suggests that the best way to fend off the next Ebola epidemic may be to shift the monitoring of disease outbreaks from passive detection clinics to active monitoring by teams who go out and test everyone.

Other sources disagree on exactly how many doctors Liberia had pre-Ebola. In this piece from back in August, Dr. Frank Glover states that there were actually 200 doctors in Liberia before the epidemic – and that 150 left after the initial outbreak.

To put this figure in context, Liberia is a country of 4 million people. According to healthgrades.com, it has about the same number of doctors as Battle Creek, Michigan, a smallish town (pop. 52,347) best solely known for making breakfast cereal. The standard way of counting how many doctors a country has, relative to its overall population, is the number of physicians per 1,000 people. On the World Bank’s page showing this number for different countries by year, the only entry for Liberia, from 2010, is “0.0”. The rate is less than the rounding error in the table.

This awful fact prompts three thoughts:

1. Over’s argument that we should look for alternative ways to address Ebola (and other similar disease outbreaks in the future), without relying on strengthening overall healthcare systems, is very compelling. Yes, it would be nice to achieve solid gains in general healthcare on the back of international concern about Ebola. But that doesn’t seem like a realistic solution to this problem. We are talking about a place with nearly no health system to strengthen. Moreover, this outbreak has horribly weakened what system there was. Doing better, cheaper monitoring could help stave off the next such disaster for Liberia’s healthcare system – or that of another African country.

2. The elasticity of physician labor supply with respect to disease risk is enormous, if we take Glover’s comments at face value. Such a large response might be totally rational – while Ebola is not easy to catch from casual contact, doctors could have reasonably feared they’d be pressed into service treating Ebola patients with totally inadequate supplies and training, which is exceedingly risky. If developing countries want to retain their physicians, they should focus on supporting them rather than trying to make it illegal to leave.

3. Liberia is not the only country that has such a massive healthcare deficit. That World Bank table lists 10 countries whose latest rate of doctors per 1,000 is too low to register. All are in sub-Saharan Africa. It is not surprising that countries with essentially no doctors experience high rates of transmission for diseases that are basically only spread from patients to those taking care of them. In the long term, it is incredibly important to address the physician deficit, not just in West Africa but in worldwide.

Posted in Uncategorized | Leave a comment

A few highlights from NEUDC 2014

My posting to this blog has gone into a semi-involuntary hiatus this fall because I am on the academic job market right now, and am dedicating nearly 100% of my time to that process.

I’m breaking that trend to talk about some awesome new papers I saw last weekend. I was lucky enough to go to NEUDC for the first time, to I present my job market paper. In the other sessions I attended, I came across some fascinating stuff. A few highlights that I can’t resist sharing:

“Alcohol and Self Control – A Field Experiment with Cycle Rickshaw Pullers in India”, Frank Schilbach (only an abstract is currently available online)

In some parts of the developing world people drink very heavily, and this could contribute to getting trapped in poverty, since alcohol can exacerbate self-control problems. Schilbach uses a randomized incentive to stay sober to show that cutting down on drinking during the work day is as effective as providing people with access to commitment savings accounts, among his high-alcohol use population. This is a fascinating result, and I wonder how much it will generalize to other populations where alcoholism is a less-severe problem (his subjects drink, on average, 6 days a week, at a rate of something like 5 drinks per day.

“The Cost of Keeping Track”, Johannes Haushofer

Haushofer augments a standard rational model of intertemporal choices in a very simple, intuitive way: if you decide to undertake a transaction in the future, you must pay a fixed cost in each period to keep track of that decision. He motivated this brilliantly in the session by asking the audience if they had ever paid a bill before it was due, and pointing out that that is technically irrational – you’re giving up interest on the money in question. But it can be rationalized by the fact that it’s not worth the effort to remember to take care of the bill in the future. This augmentation generates a bunch of well-known “predictable irrationalities”: for example, people tend to discount future gains “too much” compared with future losses, but actually do the opposite for future losses, discounting them too little. It also predicts loss aversion and status quo bias. I’m looking forward to more research on the empirical implications of this model, which I think has the potential to bring a lot of clarity to how we think about behavioral anomalies in decisionmaking.

“The Role of Road Quality Investments on Economic Activity and Welfare: Evidence from Indonesia’s Highways”, Paul J. Gertler, Marco Gonzalez-Navarro, Tadeja Gracner, and Alexander D. Rothenberg

Road maintenance funding in Indonesia is set according to arbitrary guidelines by the central government. The authors exploit this fact to measure the effect of higher-quality roads on household income. Higher-quality roads help the economy substantially, and they can show that this is due to better roads leading to a shift from agriculture into manufacturing. Some of the elasticities they find are massive: a 1% improvement in average road quality leads to a 6% increase in hours worked in manufacturing. This type of work is exciting and important because transportation infrastructure is vital for economic development, but the empirical evidence for exactly how big its benefits are is still pretty thin. These results can be plugged into cost-effectiveness calculations to help justify desperately-needed increases in funding for road maintenance in the developing world.

This is just a small sample of the cool research on display – I wasn’t able to go all the presentations I wanted to. I was really impressed with the quality of the work I saw across the board.

Now, back to is the joy of filling out webforms and tracking jobs in huge spreadsheets.

Posted in Uncategorized | 1 Comment

Sometimes rational behavior means nobody has any idea why they are doing something

The always-excellent Planet Money podcast recently did an episode on why milk is always at the back of the grocery store. It’s a fantastic piece, and well worth the 16-minute listen, but can be summarized pretty briefly. It turns out that there are two theories for the milk-in-the-back phenomenon: exploiting behavioral economics and cost minimization. The behavioral economics story is the one I was more familiar with: milk is an extremely common purchase, and it is placed in the back in order to force people to walk past a range of tempting items. Since consumers are prone to impulse purchases, this induces them to spend more.

The cost-minimization theory is one I wasn’t familiar with, but it rings true. Milk is extremely prone to spoilage and must be kept consistently cold to keep it drinkable. Keeping it in the back is the cheapest way to maintain the “cold chain” needed for this; moving it to the front would cause more losses due to spoilage and/or require.

That’s the debate, but the best part of the episode is that no one knows what the right answer is. They talk to a range of experts who voice various opinions on the subject, and get support for both theories. The first interview is with a guy who is the dairy buyer for a major grocery store chain. He voices support for the tempting-consumers theory, and literally says he “believes” that that’s why they do it. He is the guy that is doing this! And he believes that the reason why he puts the milk in the back is that it tempts the customer to spend more. The cold chain theory doesn’t fare a ton better either, getting barely more than half of the votes from people who are in the business of selling milk.

None of this implies that the people who run grocery stores are not behaving optimally, however. You don’t need to understand your own strategy to maximize profits, or even your own well-being. Imagine you run a big grocery store, or even just the dairy section of one: you make tons of decisions and face hundreds of constraints. And you observe what happens to costs, volumes sold, and profits, whenever you change something. So your process has ended up a certain way, and you can legitimately not know the exact reason why. You know that if you do different stuff with the milk, profits go down – from your own experience, from watching other stores, etc. You keep doing what you’re doing, because you are sitting at an optimum, but you don’t actually know why.*

My gut instinct is that this kind of rational behavior – where people are at an optimum but don’t know exactly why – is exceedingly common. I am reminded of the complex traditions around using one’s signals on unlit roads in rural Africa. If you approach traffic headed in the opposite direction, you have to lower your high-beams and turn on your right turn signal (the one closest the the approaching car). I’ve heard various of explanations for this: that it reminds people to dim their headlights, that it informs approaching cars about the location of the edge of your vehicle, etc. No one knows why they do it, they just do. Realizing that there has to be some good reason (or reasons!), when I drive on dark roads in Malawi, I do the same thing.

*And of course there might be multiple reasons – a reasonable model of dairy section location would involve the firm minimizing the costs of all its items and balancing that against the value of all its sales.
Posted in Uncategorized | 1 Comment