The moral imperative for honesty in development economics

There is a lot of bad research out there. Huge fractions of the published research literature do not replicate, and many studies aren’t even worth trying to replicate because they document uninteresting correlations that are not causal. This replication crisis is compounded by a “scaleup crisis”: even when results do replicate, they often do not hold at any appreciable scale. These problems are particularly bad in social science.

What can we do about the poor quality of social science research? There are a lot of top-down proposals. We should have analysis plans, and trial registries. We should subject our inferences to multiple testing adjustments. It is very hard to come up with general rules that will fix these problems, however. Even in a world where every analysis is pre-specified and all hypotheses are adjusted for multiple testing, and where every trial is registered and the results reported, people’s attention and time are finite. The exciting result is always going to garner more attention, more citations, and more likes and retweets. This “attention bias” problem is very difficult to fix.

When you are doing randomized program evaluations in developing countries, however, there is a bottom-up solution to this problem: getting the right answer really matters. Suppose you run an RCT that yields a sexy but incorrect result, be it due to deliberate fraud, a coding error, an accident of sampling error, a pilot that won’t scale, or a finding that holds just in one specific context. Someone is very likely to take your false finding and actually try and do it. Actual, scarce development resources will be thrown at your “solution”. Funding will go toward the wrong answer instead of the right ones. Finite inputs like labor and energy will be expended on the wrong thing.

And more than in any other domain of social science, doing the wrong thing will make a huge difference. The world’s poorest people live on incomes that are less than 1% of what we enjoy here in America. We could take this same budget and just give it to them in cash, which would at a minimum reduce poverty temporarily. The benefits of helping the global poor, in terms of their actual well-being, are drastically higher than those of helping any group in a rich country. $1000 is a decent chunk of change in America, but it could mean the difference between life and death for a subsistence farmer in sub-Saharan Africa. Thus, when you get an exciting result, you have an obligation to look at your tables and go “really?”

This does not mean that no development economics research is ever wrong, or that nobody working in the field ever skews their results for career reasons. Career incentives can be powerful, even in fields with similar imperatives for honesty: witness the recent exposure of fraudulent Alzheimer’s research, which may have derailed drug development and harmed millions of people. What it means is that those career incentives are counterbalanced by a powerful moral imperative to tell the truth.

Truth-telling is important not just about our own work, but (maybe moreso) when we are called upon to summarize knowledge more broadly. Literature reviews in development economics aren’t just academically interesting; they have the potential to reshape where money gets spent and which programs get implemented.  What I mean by honesty here is that when we talk to policymakers or journalist or lay people about which development programs work, we shouldn’t let our views be skewed by our own research agendas or trends in the field. For example, I have written several papers about a mother-tongue-first literacy program in Uganda, the NULP. The program works exceedingly well on average, although it is not a panacea for the learning crisis. People often ask me whether mother-tongue instruction is the best use of education funds, and I tell them no—I do not think it was the core driver of the NULP’s success, and studies that isolate changes in the language of instruction support that view. Note the countervailing incentives I face here: more spending on mother-tongue instruction might yield more citations for my work, and the approach is very popular so I am often telling people what they don’t want to hear.  But far outweighing those is the fact that what I say might really matter, and getting it wrong means that kids won’t learn to read. This is a powerful motive to do my best to get the right answer.

Honesty in assessing the overall evidence also mitigate the “attention bias” problem. Exciting results will still get bursts of attention, but when we are called upon to give our view of which programs work best, we can and should focus on the broader picture painted by the evidence. This is especially critical in development economics, where we aren’t just seeking scientific truths but trying to solve some of the world’s most pressing problems.

Posted in Uncategorized | Comments Off on The moral imperative for honesty in development economics

Nothing Scales

I recently posted a working paper where we argue that appointments can substitute for financial commitment devices. I’m pretty proud of this paper: it uses a meticulously-designed experiment to show the key result, and the empirical work is very careful and was all pre-specified. We apply the latest and best practices in selecting controls and adjusting for multiple hypothesis testing. Our results are very clear, and we tell a clear story that teaches us something very important about self-control problems in healthcare. Appointments help in part because they are social commitment devices, and—because there are no financial stakes—they don’t have the problem of people losing money when they don’t follow through. The paper also strongly suggests that appointments are a useful tool at encouraging people to utilize preventive healthcare—they increase the HIV testing rate by over 100%.

That’s pretty promising! Maybe we should try appointments as a way to encourage people to get vaccinated for covid, too? Well, maybe not. A new NBER working paper tries something similar for covid vaccinations in the US.  Not only does texting people a link to an easy-to-use appointment website not work, neither does anything else that they try, including just paying people $50 to get vaccinated.

Different people, different treatment effects

Why don’t appointments increase covid vaccinations when they worked for HIV testing? The most likely story is that this is a different group of people and their treatment effects are different. I don’t just mean that one set is in Contra Costa County and the other one is in the city of Zomba, although that probably matters. I mean that the Chang et al. study specifically targets the vaccine hesitant, whereas men in our study mostly wanted to get tested for HIV: 92 percent of our sample had previously been tested for HIV at least once. In other words, if you found testing-hesitant men in urban southern Malawi, these behavioral nudges probably wouldn’t help encourage them to get an HIV test either. That makes sense if you think about it: we show that our intervention helps people overcome procrastination and other self-control problems. These are fundamentally problems of people wanting to get tested but not managing to get around to it. The vaccine-hesitant aren’t procrastinating; by and large they just don’t want to get a shot. Indeed, other research confirms that appointments do increase HIV testing rates—just as this explanation would predict.

This is all to say that the treatment effects are heterogeneous: the treatment affects each person—or each observation in your dataset—differently. This is an issue that we can deal with. Our appointments study documents exactly the kind of heterogeneity that the theory above would predict. The treatment effects for appointments are concentrated overwhelmingly among people who want to enroll in a financial commitment device to help ensure they go in for an HIV test. Thus we could forecast that people who don’t want a covid shot at all definitely won’t have their behavior changed much by an appointment.

But trying to analyze this is very rare, which is a disaster for social science research. Good empirical social science almost always focuses on estimating a causal relationship: what is β in Y = α + βX + ϵ? But these relationships are all over the place: there is no underlying β to be estimated! Let’s ignore nonlinearity for a second, and say we are happy with the best linear approximation to the underlying function. The right answer here still potentially differs for every person, and at every point in time.* Your estimate is just some weighted average of a bunch of unit-specific βs, even if you avoid randomized experiments and run some other causal inference approach on the entire population.

This isn’t a new insight: the Nobel prize was just given out in part for showing that an IV identifies a local average treatment effect for some slice of the population. Other non-experimental methods won’t rescue us either: identification is always coming from some small subset of the data. The Great Difference-in-Differences Reckoning is driven, at its core, by the realization that DiDs are identified off of specific comparisons between units, and each unit’s treatment effect can be different. Matching estimators usually don’t yield consistent estimates of causal effects, but when they do it’s because we are exploiting idiosyncrasies in treatment assignment for a small number of people. Non-quantitative methods are in an even worse spot. I am a fan of the idea that qualitative data can be used to understand the mechanisms behind treatment effects—but along with person-specific treatment effects, we need to try to capture person-specific mechanisms that might change over time.

Nothing scales

Treatment effect heterogeneity also helps explain why the development literature is littered with failed attempts to scale interventions up or run them in different contexts. Growth mindset did nothing when scaled up in Argentina. Running the “Jamaican Model” of home visits to promote child development at large scale yields far smaller effects than the original study. The list goes on and on; to a first approximation, nothing we try in development scales.

Estimated effect sizes for the Jamaican Model at different scales

Estimated effect sizes for the Jamaican Model at different scales

Why not? Scaling up a program requires running it on new people who may have different treatment effects. And the finding, again and again, is that this is really hard to do well. Take the “Sugar Daddies” HIV-prevention intervention, which worked in Kenya, for example. It was much less effective in Botswana, a context where HIV treatment is more accessible and sugar daddies come from different age ranges.** Treatment effects may also vary within person over time: scaling up the “No Lean Season” intervention involved  doing it again later on, and one theory for why it didn’t work is that the year they tried it again was marked by extreme floods. Note that this is a very different challenge from the “replication crisis” that has most famously plagued social psychology. The average treatment effect of appointments in our study matches the one in the other study I mentioned above, and the original study that motivated No Lean Season literally contains a second RCT that, in part, replicates the main result.

I also doubt that this is about some intrinsic problem with scaling things up. The motivation for our appointments intervention was that, anecdotally, appointments work at huge scale in the developed world to do things like get people to go to the dentist. I’m confident that if we just ran the same intervention on more people who were procrastinating about getting HIV tests, we could achieve similar results. However, we rarely actually run the original intervention at larger scale. Instead, the tendency is to water it down, which can make things significantly less effective. Case in point: replicating an effective education intervention in Uganda in more schools yielded virtually-identical results, whereas a modified program that tried to simulate how policymakers would reduce costs was substantially worse. That’s the theory that Evidence Action favors for why No Lean Season didn’t work at scale—they think the implementation changed in important ways.

What do we do about this?

I see two ways forward. First, we need a better understanding of how to get policymakers to actually implement interventions that work. There is some exciting new work on this front in a recent issue of the AER, but this seems like very low-hanging fruit to me. Time and again, we have real trouble just replicating actual treatments that work—instead, the scaled-up version almost always is watered down.

Second, every study should report estimates of how much treatment effects vary, and try to link that variation to a model of human behavior.  There is a robust econometric literature on treatment effect heterogeneity, but actually looking at this in applied work is very rare. Let’s take education as an example. I just put out another new working paper with a different set of coauthors called “Some Children Left Behind”. We look at how much the effects of an education program vary across kids. The nonparametric Frechet-Hoffding lower bounds on treatment effect variation are massive; treatment effects vary from no gain at all to a 3-SD increase in test scores. But as far as I know nobody’s even looked at that for other education programs. Across eight systematic reviews of developing-country education RCTs (covering hundreds of studies), we found just four mentions of variation in treatment effects, and all of them used the “interact treatment with X” approach. That’s unlikely to pick up much: we find that cutting-edge ML techniques can explain less than 10 percent of the treatment effect heterogeneity in our data using our available Xs. The real challenge here is to link the variation in treatment effects to our models of the world, which means we are going to need to collect far better Xs.

This latter point means social scientists have a lot of work ahead of us. None of the techniques we use to look at treatment effect variation currently work for non-experimental causal inference techniques. Given how crucial variation in treatment effects is, this seems like fertile ground for applied econometricians. Moreover, almost all of our studies are underpowered for understanding heterogeneous treatment effects, and in many cases we aren’t currently collecting the kinds of baseline data we would need to really understand the heterogeneity—remember, ML didn’t find much in our education paper. That means that the real goal here is quite elusive: how do we predict which things will replicate out-of-sample and which won’t? To get this right we need new methods, more and better data, and a renewed focus on how the world really works.

*And potentially on everyone else’s value of X as well, due to spillovers and GE effects.
** This point is not new to the literature on scale-up: Hunt Alcott argues that RCTs specifically select locations with the largest treatment effects.
Posted in Uncategorized | Comments Off on Nothing Scales

What empirical microeconomics tells us about reparations

Ta-Nehisi Coates argues that the United States government should pay reparations to African-Americans for slavery and institutionalized racism. The essay is long and full of supporting evidence, and generally makes a strong case that the US government bears responsibility for oppressing blacks for hundreds of years. While Coates digresses occasionally  – into claims of broader guilt by all Americans, or all whites, or into arguments that America’s current prosperity depends on its history of oppressing blacks – those claims are not necessary for his main point to hold water. That point is fairly straightforward: the US government was complicit in a moral evil, and it should take steps to make right for that evil, as it did, for example, for the internment of Japanese Americans during World War II.

Leaving aside the merits of the underlying idea, and the tasking of pinning down what the value of the reparations would be and how to allocate them, I wanted to discuss the practical aspect: what would providing reparations accomplish? Could transferring money to blacks help close the yawning gaps between them and whites that exist across a broad range of social indicators? Reparations need not be cash transfers – Coates cites Charles Ogletree’s idea of reparations in the form of job training programs – but usually the term is associated with the payment of cash to the afflicted group. This fixes a key economic question: what would happen if the US government made a massive financial transfer to every black person in America?

In some sense the right answer to this is “we don’t know”. We have never tried doing this, let alone in an experimental framework that would allow us to measure its effects. Coates does list one empirical example – the German payment of Holocaust reparations to the Israeli government, which is credited with funding the country through a tough spell and contributing to substantial economic growth. But most of those payments went to the government, not to individuals, so it is unclear how those effects would translate to the context of reparations to blacks in the US.

Even though no one has ever run this experiment, we do have evidence on what happens when people receive large cash transfers. The best evidence comes from a paper by Hoyt Bleakley (who is joining Michigan’s economics faculty in the fall) and Joseph Ferrie, about a lottery that distributed land at random to adult white males in Georgia (ungated working paper version).* The winners of this lottery received land worth approximately as much as the entire wealth holdings of the median person at the time. Given that the average black family has one sixth the wealth of the average white family, this is actually pretty close to the magnitude of the transfer we’d be talking about.


This image (from Wonkblog) shows that the black-white wealth divide has widened rather than narrowing over time

Large cash transfers help: they make the recipients richer. But they don’t have the long-term social ramifications that you might hope for. The children and grandchildren of lottery winners end up no wealthier and no better-educated than non-winners. The big caveat with this comparison is that the Bleakley and Ferrie paper studies people from the 19th century, so the sample and context are quite different than they are today. However, I’d actually expect those differences to lead to larger effects than we’d see from targeting a poorer, more disadvantaged group. Overall, this suggests that wealth transfers – even massive ones – will not have transformational effects on socioeconomic status that last across generations.

On the other hand, a wide range of evidence suggests that, contrary to stereotypes, people (even poor people) do not “waste” cash transfers on alcohol, cigarettes, or other vices.** Those results are for transfers on a scale much smaller than reparations would operate on, and are for much poorer populations than the typical black American. But implicit in the the stereotype that money will go toward alcohol is notion that poorer people should have bigger problems with this.*** Since even very poor people seem to have no problem refraining from potentially-problematic spending, it is unlikely that this would be an issue for a reparations program.

Taken together, the evidence from empirical economics tells us that reparations, if done as pure financial transfers, would make blacks richer and with few downsides – but that they would not have transformative effects on the long-run gaps in outcomes between whites and blacks. While wealth is inherited, wealthy people also propagate success through their family lines by passing down other attributes – from education to behaviors to social connections to their race – that end up washing out the effects of wealth alone. To fix the black-white gap in a permanent way, we need to address all sorts of other differences as well; addressing wealth alone is not enough.

What about other ways of providing reparations? The literature on job training programs for marginalized groups is fairly discouraging, so I’m not convinced that Ogletree’s proposal would work well (although maybe we need to work on developing better job training). Another possibility is to work through the education system. Roland Fryer’s research has shown that improving middle-school educational outcomes for blacks helps them close gaps in other social outcomes. At the college level, there is robust, although not necessarily causal, evidence that high-quality colleges help blacks quite a bit (and matter much less for whites). One policy that might work is to replace affirmative action with an official reparations program, funded by the federal government, that creates additional slots at all universities to accommodate black students. This would reduce the racial tension that is stirred up by the current system, where people perceive that they are being denied admission based on their race, and where the moral and legal justification for the scheme is not made clear. It might reduce opposition to the program as well.

More broadly, we still need more evidence about what kinds of programs help generate permanent reductions in the black-white social divide. If reparations end up being taken seriously, then the government should fund and promote experimental and regression-discontinuity research into a wide range of possible programs in order to see which ones work. Financial transfers alone may not work – but we have the empirical tools needed to figure out what does.

*In a dark irony, this land came from one of the worst crimes against humanity the US government has ever committed.
**As I’ve discussed on this blog in the past, it is not obvious that these purchases are wasteful; we need to take seriously the idea that people have agency – that they can be trusted to make their own decisions.
***Basic economic theory actually suggests the opposite, since poorer people have a tighter budget constraint. But it also tells us that people’s unaltered choices maximize their own welfare, so this is a non-problem. Chris Blattman believes that the homeless in the US are fundamentally different from similar-looking populations in Africa, but that would only apply to the poorest black people who received reparations.
Posted in Uncategorized | Comments Off on What empirical microeconomics tells us about reparations

"People think it’s easy to contract HIV. That’s a good thing, right? Maybe not."

That’s the title of my guest post on the World Bank’s Development Impact blog, describing my job market paper. Here’s a bit of the introduction:

People are afraid of HIV. Moreover, people around the world are convinced that the virus is easier to get than it actually is. The median person thinks that if you have unprotected sex with an HIV-positive person a single time, you will get HIV for sure. The truth is that it’s not nearly that easy to get HIV – the medical literature estimates that the transmission rate is actually about 0.1% per sex act, or 10% per year.

One way of interpreting these big overestimates of risks is that HIV education is working. […] The classic risk compensation model says this should be causing reductions in unprotected sex.

Unfortunately, the risk compensation story doesn’t seem to be reflected in actual behavior – at least not in sub-Saharan Africa, where the HIV epidemic is at its worst. […] If people are so scared, why don’t they seem to be compensating away from the risk of HIV infection? I tackle that question in my job market paper, “The Effect of HIV Risk Beliefs on Risky Sexual Behavior: Scared Straight or Scared to Death?” My answer is surprising.

You can read the whole thing on their site by following this link. My post is part of their annual Blog Your Job Market Paper series, which features summaries of research from development economics Ph.D. students on the job market. People who follow this blog should check out that series, which has featured some really awesome research this year. More generally, Development Impact is by popular acclamation the best development-focused blog out there; I read every post.

Posted in Uncategorized | Comments Off on "People think it’s easy to contract HIV. That’s a good thing, right? Maybe not."

Making the Grade: The Sensitivity of Education Program Effectiveness to Input Choices and Outcome Measures




/* Style Definitions */
{mso-style-name:”Table Normal”;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
font-family:”Times New Roman”,serif;
mso-bidi-font-family:”Times New Roman”;

I’m very happy to announce that my paper with Rebecca Thornton, “Making the Grade: The Sensitivity of Education Program Effectiveness to Input Choices and Outcome Measures”, has been accepted by the Review of Economics and Statistics. An un-gated copy of the final pre-print is available here.

Here’s the abstract of the paper:

This paper demonstrates the acute sensitivity of education program effectiveness to the choices of inputs and outcome measures, using a randomized evaluation of a mother-tongue literacy program. The program raises reading scores by 0.64SDs and writing scores by 0.45SDs. A reduced-cost version instead yields statistically-insignificant reading gains and some large negative effects (-0.33SDs) on advanced writing. We combine a conceptual model of education production with detailed classroom observations to examine the mechanisms driving the results; we show they could be driven by the program initially lowering productivity before raising it, and potentially by missing complementary inputs in the reduced-cost version. 

The program we study, the Northern Uganda Literacy Project, is one of the most effective education interventions in the world. It is at the 99th percentile of the distribution of treatment effects in McEwan (2015), and would rank as the single most effective for improving reading. It improves reading scores by 0.64 standard deviations. Using the Evans and Yuan equivalent-years-of-schooling conversion, that is as much as we’d expect students to improve in three years of school under the status quo. It is over four times as much as the control-group students improve from the beginning to the end of the school year in our study.


Effects of the NULP intervention on reading scores (in control-group SDs)

It is also expensive: it costs nearly $20 per student, more than twice as much as the average intervention for which cost data is available. So we worked with Mango Tree, the organization that developed it, to design a reduced-cost version. This version cut costs by getting rid of less-essential materials, and also by shifting to a train-the-trainers model of program delivery. It was somewhat less effective for improving reading scores (see above), and for the basic writing skill of name-writing, but actually backfired for some measures of writing skills:


Effects of the NULP intervention on writing scores (in control-group SDs)

This means that the relative cost-effectiveness of the two versions of the program is highly sensitive to which outcome measure we use. Focusing just on the most-basic skill of letter name recognition makes the cheaper version look great—but its cost effectiveness is negative when we look at writing skills.


Why did this happen? The intervention was delivered as a package, and we couldn’t test the components separately for two reasons. Resource constraints meant that we didn’t have enough schools to test all the many different combinations of inputs. More important, practical constraints make it hard to separate some inputs from one another. For example, the intervention involves intensive teacher training and support. That training relies on the textbooks, and could not be delivered without them.

Instead, we develop a model of education production with multiple inputs and outputs, and show that there are several mechanisms that could lead to a reduction in inputs not just lowering the treatment effects of the program, but actually leading to declines in some education outcomes. First, if the intervention raises productivity more for one outcome more than another, this can lead to a decline in the second outcome due to a substitution effect. Second, a similar pattern can occur if inputs are complements in producing certain skills and one is omitted. Third, the program may actually make teachers less productive in the short term, as part of overhauling their teaching methods—a so-called “J-curve”.

We find the strongest evidence for this third mechanism. Productivity for writing, in terms of learning gains per minute, actually falls in the reduced-cost schools. It is plausible that the reduced-cost version of the program pushed teachers onto the negative portion of the J-curve, but didn’t do enough to get them into the region of gains. In contrast, for reading (and for both skills in the full-cost version) the program provided a sufficient push to achieve gains.

There is also some evidence of that missing complementary inputs were important for the backfiring of the reduced-cost program. Some of the omitted inputs are designed to be complements—for example, slates that students can use to practice writing with chalk. Moreover, we find that classroom behaviors by teachers and students have little predictive power for test scores when entered linearly, but allowing for non-linear terms and interactions leads to a much higher R-squared. Notably, the machine-learning methods we apply indicate that the greatest predictive power comes from interactions between variables.

These findings are an important cautionary tale for policymakers who are interested in using successful education programs, but worried about their costs. Cutting costs by stripping out inputs may not just reduce a program’s effectiveness, but actually make it worse than doing nothing at all.

For more details, check out the paper here. Comments are welcome—while this paper is already published, Rebecca and I (along with Julie Buhl-Wiggers and Jeff Smith) are working on a number of followup papers based on the same dataset.

Posted in Uncategorized | Comments Off on Making the Grade: The Sensitivity of Education Program Effectiveness to Input Choices and Outcome Measures

A Nobel Prize for Development Economics as an Experimental Science

Fifteen years ago I was an undergrad physics major, and I had just finished a summer spent teaching schoolchildren in Tanzania about HIV. The trip was both inspiring and demoralizing. I had gotten involved because I knew AIDS was important and thought addressing it was a silver bullet to solve all of sub-Saharan Africa’s problems. I came away from the trip having probably accomplished little, but learned a lot about the tangled constellation of challenges facing Tanzanians. They lacked access to higher education, to power, to running water. AIDS was a big problem, but one of many. And could we do anything about these issues? Most of my courses on international development were at best descriptive and at worst defeatist. There were lots of problems, and colonialism was to blame. Or maybe the oil curse. Or trade policy. It was hard to tell.

Just as I was pondering these problems and what I could do about them, talk began to spread about the incredible work being done by Abhijit Banerjee and Esther Duflo. They had started an organization, J-PAL, that was running actual experiments to study solutions to economic and social problems in the world’s poorest places. At this point, my undergraduate courses still emphasized that economics was not an experimental science. But I started reading about this new movement to change that, in development economics in particular, by using RCTs to test the effects of programs and answer first-order economic questions.

At the same time, I also learned about the work being done by Michael Kremer, another of the architects of the experimental revolution in development economics. One of the first development RCT papers I read remains my all-time favorite economics paper: Ted Miguel and Kremer‘s Worms. This paper has it all. They study a specific & important program, and answer first-order questions in health economics. They use a randomized trial, but their analysis is informed by economic theory: because intestinal worm treatment has positive externalities, you will drastically understate the benefits of treatment if you ignore that in your data analysis. And the results were hugely influential: Deworm the World is now implementing school-based deworming around the world. I was sold: I changed career paths and started pursuing development economics. And I became what is often called a randomista, a researcher focused on using randomized trials to study economic issues and solve policy problems in poor countries. Kremer is in fact my academic grandfather: he advised Rebecca Thornton, who in turn advised me.

When the Nobel Prize in Economics was awarded to Banerjee, Duflo, and Kremer this Monday, a major reason was because of their tremendous influence on hundreds if not thousands of people with stories like mine. Without their influence, the field of development economics would look entirely different. A huge share of us wouldn’t be economists at all, and if we were we would be doing entirely different things. Beyond development economics per se, the RCT revolution spilled over into other fields. We increasingly think of economics as an experimental science (which was the title of my dissertation) – even when we cannot run actual experiments, we think about our data analysis as approximating an experimental ideal. Field experiments have been used in economics for a long time, but this years prize-winners helped make them into the gold standard for empirical work in the field.

They also helped make experiments the gold standard in studying development interventions, and this has been a colossal change in how we try to help the poor. Whereas once policymakers and donors had to be convinced by researchers that rigorous impact evaluations were important, now they actually seek out research partners to study their ideas. This has meant that we increasingly know what actually works in development, and even more important, what doesn’t work. We can rigorously show that many silver bullets aren’t so shiny after all – for example, additional expansions of microcredit do not transform the lives of the poor.

What is particularly striking and praiseworthy about this award is how early it came. There was a consensus that this trio would win a Nobel prize at some point, but these awards tend to be handed out well after the fact, once time has made researchers’ impact on the field clearer. It is a testament to their tremendous impact on the field of economics that it was already obvious that Duflo, Banerjee, and Kremer were worthy of the Nobel prize, and a credit to the committee that they saw fit to recognize the contributions so quickly. I think it’s fitting that Duflo is now the youngest person ever to win a Nobel prize in economics – given her influence on the field, it’s hard to believe she is just 46 years old.

Posted in Uncategorized | Comments Off on A Nobel Prize for Development Economics as an Experimental Science

“Pay Me Later”: A simple, cheap, and surprisingly effective savings technology

Why would you ask your employer not to pay you yet? This is something I would personally never do. If I don’t want to spend money yet, I can just keep it in a bank account. But it’s a fairly common request in developing countries: my own field staff have asked this of me several times, and dairy farmers in Kenya will actually accept lower pay in order to put off getting paid.

The logic here is simple. In developed economies, savings earns a positive return, but in much of the developing world, people face a negative effective interest rate on their savings. Banks are loaded with transaction costs and hidden fees, and money hidden elsewhere could be stolen or lost. So deferred wages can be a very attractive way to save money until you actually want to spend it.

Lasse Brune, Eric Chyn, and I just finished a paper that takes that idea and turns it into a practical savings product for employees of a tea company in Malawi. Workers could choose to sign up and have a fraction of their pay withheld each payday, to be paid out in a lump sum at the end of the three-month harvest season.  About 52% of workers chose to sign up for the product; this choice was implemented at random for half of them. Workers who signed up saved 14% of their income in the scheme and increased their net savings by 24%.

dw balances

Accumulation of money in the deferred wages account over the course of the harvest season. The lump-sum payout was on April 30th.

The savings product has lasting effects on wealth. Workers spent a large fraction of their savings on durables, especially goods used for home improvements. Four months after the scheme ended, they owned 10% more assets overall, and 34% more of the iron sheeting used to improve roofs. We then let treatment-group workers participate in the savings product two more times, and followed up ten months after the lump sum payout for the last round. Treatment-group workers ended up 10% more likely to have improved metal roofs on their homes.*

This “Pay Me Later” product was unusually popular and successful for a savings intervention, which usually have low takeup and utilization and rarely have downstream effects.** What made this product work so well? We ran a set of additional choice experiments to figure out which features drove the high demand for this form of savings.

The first key feature is paying out the savings in a lump sum. When we offered a version of the scheme that paid out the savings smoothly (in six weekly installments) takeup fell to just 36%. The second is the automatic “deposits” that are built into the design. We offered some workers an identical product that differed only in that deposits were manual: a project staffer was located adjacent to the payroll site to accept deposits. Signup matched the original scheme but actual utilization was much lower.

On the other hand, the seasonal timing of the product was much less important for driving demand: it was just about as popular during the offseason as the main harvest season. The commitment savings aspect of the product also doesn’t matter much. When we offered a version of the product where workers could access the funds at any time during the season, it was just as popular as the original version where the funds were locked away.

In summary, letting people opt in to get paid later is a very promising way to help them save money. It can be run at nearly zero marginal cost, once the payroll system is designed to accommodate it and people are signed up. The benefits are substantial: it’s very popular and leads to meaningful increases in wealth.  It could potentially be deployed not just by firms but also by governments running cash programs and workfare schemes.

The success of “Pay Me Later” highlights the importance of paying attention to the solutions people in developing countries are already finding to the malfunctioning markets hindering their lives. Eric, Lasse, and I did a lot of work to design the experiment, and our field team and the management at the Lujeri Tea Estate deserve credit for making the research and the project work. But a lot of credit also should go to the workers who asked us not to pay them yet – this is their idea, and it worked extremely well.

Check out the paper for more about the savings product and our findings (link).

*These results are robust to correction for multiple hypothesis testing using the FWER adjustment of Haushofer and Shapiro (2016).
**A partial exception is Schaner (2018), which finds that interest rate subsidies on savings accounts lead to increases in assets and income. However, the channel appears to be raising entrepreneurship rather than utilization of the accounts.
Posted in Uncategorized | Comments Off on “Pay Me Later”: A simple, cheap, and surprisingly effective savings technology

How Important is Temptation Spending? Maybe Less than We Thought

Poor people often have trouble saving money for a number of reasons: the banks they have access to are low-quality and expensive (and far away), saving is risky, and money that they do save is often eaten away by kin taxes. One reason that has featured prominently in theoretical explanations of poverty traps is “temptation spending” – goods like alcohol or tobacco that people can’t resist buying even though they’d really prefer not to. Intuitively, exposure to temptation reduces saving in two ways. First, it directly drains people’s cash holdings, so money they might have saved gets “wasted” on the good in question. Second, people realize that their future self will just waste their savings on temptation goods, so they don’t even try to save.

But how important is temptation spending in the economic lives of the poor? Together with Lasse Brune and my student Qingxiao Li, I have just completed a draft of a paper that tackles this question using data from a field experiment in Malawi. The short answer is: probably not very important after all.

One of our key contributions in the paper is to measure temptation spending by letting people define it for themselves. We do this two ways: first, we allow our subjects to list goods they are often tempted to buy or feel they waste money on, and then match that person-specific list of goods to a separate enumeration of items that they purchased. Second, we let people give the simple sum of money they spent that they felt was wasted. We also present several other potential definitions of temptation spending that are common in the literature, including the alcohol & tobacco definition, and also a combined index across all the definitions. The correlations between these measures are not very high: spending on alcohol & tobacco correlates with spending on self-designated temptation goods at just 0.07:


This is the result of people picking very different goods than policymakers or researchers might select as “temptation goods”. For example people commonly listed clothes as a temptation good, whereas alcohol was fairly uncommon.

We also show that direct exposure to a tempting environment does not significantly affect spending on temptation goods – let alone downstream outcomes. Our subjects were workers who received extra cash income during the agricultural offseason as part of our study. All workers received their pay at the largest local trading center, and some were randomly assigned to receive their pay during the market day (as opposed to the day before). This was the most-tempting environment commonly reported by the people in our study. Getting paid at the market didn’t move any of our measures of temptation spending and we can rule out meaningful effect sizes.

Why not? We go through a set of six possible explanations and find support for two of them. The first is substitution bias: the market where workers were paid was just one of several in the local area, some of which operated on the day the untreated workers were paid. It was feasible for them to go to the other markets to seek out temptation goods to buy, effectively undoing the treatment. This implies a very different model of temptation than we usually have in mind: it would mean that the purchases tempt you even if they are far away and you have to go seek them out.*

The second is pre-commitment to spending plans. If workers can find a way to mentally “tie their hands” by committing to spend their earnings on specific goods or services, they can mitigate the effects of temptation. We see some empirical evidence for this: the effects of the treatment are heterogeneous by whether workers have children enrolled in school. School fees are a common pre-planned expense in our setting; consistent with workers pre-committing to pay school fees, we see zero treatment effects for workers with children in school, and substantial positive effects for other workers.

Both of these explanations suggest that temptation spending is much less of a policy concern than we might have thought. The first story implies that specific exposure to a tempting environment may not matter at all – people will seek out tempting goods whether they are near them or not. The latter suggests that people can use either mental accounting or actual financial agreements to shield themselves from the risk of temptation spending.

There is much more in the paper, “How Important is Temptation Spending? Maybe Less than We Thought” – check it out by clicking here. Feedback and suggestions are very welcome!

*I have personally experienced this sort of temptation for Icees, which aren’t good for me but which I will go out of my way to obtain.
Posted in Uncategorized | Comments Off on How Important is Temptation Spending? Maybe Less than We Thought

We can do better than just giving cash to poor people. Here’s why that matters.

Cash transfers are an enormously valuable, and increasingly widespread, development intervention. Their value and popularity has driven a vast literature studying how various kinds of cash transfers (conditional, unconditional, cash-for-work, remittances) affect all sorts of outcomes (finances, health, education, job choice). I work in one small corner of this literature myself: Lasse Brune and I just finished a revision of our paper on how the frequency of cash payouts affects savings behavior, and we are currently studying (along with Eric Chyn) how to use that approach as an actual savings product.

After all the excitement over their potential benefits, a couple of recent results have taken a bit of the luster off of cash transfers. First, the three-year followup of the GiveDirectly evaluation in Kenya showed evidence that many effects had faded out, although asset ownership was still higher. Then came a nine-year (!!) followup of a cash grant program in Uganda, where initial gains in earnings had disappeared (but again, asset ownership remained higher).

One question raised by these results is whether we can do any better than just giving people cash. A new paper by McIntosh and Zeitlin tackles this question head-on, with careful comparisons between a sanitation-focused intervention and a cost-equivalent cash transfer. They actually tried a bunch of cash transfers in a range so that they could get the exact cost-equivalency through regression adjustment. In their study, there’s no clear rank ordering between cost-equivalent cash and the actual program; neither have big impacts, and they change different things (though providing a larger cash transfer does appear to dominate the program across all outcomes).

This is just one program, though – can any program beat cash? It turns out that the answer is yes! At MIEDC this spring, I saw Dean Karlan present results from a “Graduation” program that provided a package of interventions (training, mentoring, cash, and a savings group) in several different countries. The Uganda results, available here, show that the program significantly improved a wide range of poverty metrics, while a cost-equivalent cash transfer “did not appear to have meaningful impacts on poverty outcomes”.

This is a huge deal. The basic neoclassical model predicts that, at best, a program can never beat giving people cash, the best you can do is tie.* People know what they need and can use money to buy it. If you spend the same amount of money, you could achieve the same benefits for them if you happen to hit on exactly what they want, but if you pick anything else you would have done better to just hand them money. (This is the logic behind the annual Christmas tradition of journalists trotting out some economist to explain to the world why giving gifts is inefficient. And economists wonder why no one likes us!)

The fact that we can do better than just handing out cash to people is a rejection of that model in favor of models with multiple interlocking market failures – some of which may be psychological or “behavioral” in nature. That’s a validation of our basic understanding of why poor places stay poor. In a standard model, a simple lack of funds, or even the failure of one market, is not enough to drive a permanent poverty trap. You need multiple markets failing at once to keep people from escaping from poverty. For example, a lack of access to credit is bad, and will hurt entrepreneurs’ ability to make investments. But even without credit, they could instead save money to eventually make the same investments. A behavioral or social constraint that keeps them from saving, in contrast, can keep them from making those investments at all.

McIntosh and Zeitlin refer to Das, Do, and Ozler, who point out that “in the absence of external market imperfections, intra-household bargaining concerns, or behavioral inconsistencies, the outcomes moved by cash transfers are by definition those that maximize welfare impacts.” While their study finds that neither cash nor the program was a clear winner, the Graduation intervention package, in contrast, clearly beats an equivalent amount of cash on a whole host of metrics. We can account for this in two ways. One view is that the cash group actually was better off – people would really prefer to spend a windfall quickly than make a set of investments that pay off with longer-term gains. The other, which I ascribe to, is that there are other constraints at work here. Under this model, the cash group just couldn’t make those investments – they didn’t have the access to savings markets, or there is a missing market in training/skill development, etc.

There is an important practical implication as well. The notion of “benchmarking” development interventions by comparing them to handing out cash is growing in popularity, and it’s an important movement. Indeed, the McIntosh and Zeitlin study makes major contributions by figuring out how to do this benchmarking correctly, and by pushing the envelope on getting development agencies to think about cash as a benchmark.** But what do we do when there is no obvious way to benchmark via cash? In particular, when we are studying education interventions, who should we be thinking about making the cash transfers to? McIntosh and Zeitlin talk about a default of targeting the cash to the people targeted by the in-kind program. In many education programs, the teachers are the people targeted directly. In others, it is the school boards that are the direct recipients of an intervention. Neither group of people is really the aim of an education program: we want students to learn. And, perhaps unsurprisingly, direct cash transfers to teachers and school boards don’t do much to improve learning. You could change the targeting in this case, and give the cash to the students, or to their parents, or maybe just to their mothers – there turn out to be many possible ways of doing this.

So it’s really important that we now have an example of a program that clearly did better than a direct cash transfer. From a theoretical perspective, this is akin to Jensen and Miller’s discovery of Giffen goods in their 2008 paper about rice and wheat in China: it validates the way we have been trying to model persistent poverty. From the practical side, it raises our confidence that the other interventions we are doing are worthwhile, in contexts where benchmarking to cash is impractical, overly complicated, or simply hasn’t been tried. Perhaps we haven’t proven that teacher training is better than a cash transfer, but we do at least know that high-quality programs can be more valuable than simply handing out money.

EDIT: Ben Meiselman pointed out a typo in the original version of this post (I was missing “best” in “the best you can do is tie”), which I have corrected.

*I am ignoring spillovers onto people who don’t get the cash here, which, as Berk Ozler has pointed out, can be a big deal – and are often negative.

**Doing this remains controversial in the development sector – so much so that many of the other projects that are trying cash benchmarking are doing it in “stealth mode”.

Posted in Uncategorized | Comments Off on We can do better than just giving cash to poor people. Here’s why that matters.

How to quickly convert Powerpoint slides to Beamer (and indent the code nicely too)

Like most economists, I like to present my research using Beamer. This is in part for costly signaling reasons – doing my slides via TeX proves that I am smart/diligent enough to do that. But it’s also for stylistic reasons: Beamer can automatically put a little index at the top of my slides  so people know where I am going, and I like the default fonts and colors.

Moreover, Beamer forces me to obey the First Law of Slidemaking: get all those extra words off your slides. Powerpoint will happily rescale things and let you put tons of text on the screen at once. Beamer – unless you mess with it heavily – simply won’t, and so forces you to make short, parsimonious bullet points (and limit how many you use).

Not everyone is on the same page about which tool to use all the time, which in the past has occasionally meant I needed to take my coauthor’s Powerpoint slides and copy them into Beamer line-by-line. Fortunately, today I found a solution for automating that process.

StackExchange user Louis has a post where he shares VBA code that can quickly move your Powerpoint slides over to Beamer. His code is great but I wasn’t totally happy with the output so I made a couple of tweaks to simplify it a bit. You can view and download my code here; I provide it for free with no warranties, guarantees, or promises. Use at your own risk.

Here is how to use it:

  1. Convert your slides to .ppt format using “Save As”. (The code won’t work on .pptx files).
  2. Put the file in its own folder that contains nothing else. WARNING: If files with the same names as those used by the code are in this folder they will be overwritten.
  3. Download the VBA code here (use at your own risk).
  4. Open up the Macros menu in Powerpoint (You can add it via “Customize the Ribbon”. Hit “New Group” on the right and rename it “Macros”, then select “Macros” on the left and hit “Add”.)
  5. Type “ConvertToBeamer” under “Macro name”, then hit “Create”
  6. Select all the text in the window that appears and delete it. Paste the VBA code in.
  7. Save, then close the Microsoft Visual Basic for Applications window.
  8. Hit the Macros button again, select “ConvertToBeamer” and run it.
  9. There will now be a .txt file with the Beamer code for your slides in it. (It won’t compile without an appropriate header.) If your file is called “MySlides.ppt” the text file will be “MySlides.txt”
  10. You need to manually fix a few special characters, as always when importing text into TeX. Look out for $, %, carriage returns, and all types of quotation marks and apostrophes. I also found that some tables came through fine while others needed manual tweaking.

One issue I had with the output was that it didn’t have any indentations, making it hard to recognize nested bullets. Fortunately I found this page that will indent TeX code automatically.

I found this to be a huge time saver. Even with figuring it out for the first time, tweaking the code, and writing this post, it still probably saved me several hours of work. Hopefully others find this useful as well.

Posted in Uncategorized | Comments Off on How to quickly convert Powerpoint slides to Beamer (and indent the code nicely too)