Hot on the heels of a (probably wrong) paper on ivory poaching that I criticized a few days ago, Vox reports on a paper that claims schools that give away condoms have higher teen pregnancy rates. Ooh look, a counter-intuitive finding! Economists love that stuff, right? This is a bit unfortunate for Vox since the same author has multiple articles from 2014 about rapidly falling birth rates that are easily explained by the fact that teenagers are really good at using contraceptives. So which Vox is correct, 2014 Teens-are-pregnancy-bulletproof Vox that cites national pregnancy and abortion stats, or 2016 give-em-condoms-and-they-breed-like-rabbits Vox that relies on a non-peer-reviewed article by economists at NBER? Let’s investigate this new paper …

The paper can be obtained here. Basically the authors have found data on school districts that did or didn’t introduce free condom programs between 1989 and 1993, and linked this with county-level information on teen birth rates over the same period. They then used a regression model to identify whether counties with a school district that introduced condom programs had different teen pregnancy outcomes to those that didn’t. They used secondary data, and obtained the data on condom distribution programs from other journal articles, but because population information is not available for school districts they used some workarounds to make the condom program data work with the county population data. They modeled everything using ordinary least squares (OLS) regression. The major problems with this article are:

  • They modeled the log of the birth rate using OLS rather than directly modeling the birth rate using Poisson regression
  • Their tests based on ratios of teen to adult births obscures trends
  • They didn’t use a difference in difference model

I’m going to go through these three problems of the model, and explain why I think it doesn’t present the evidence they claim. But first I want to just make a few points about some frustrating weaknesses in this article that make me think these NBER articles really need to be peer-reviewed before they’re published.

A few petty complaints about this article

My first complaint is that the authors refer to “fighting AIDS” and “AIDS/HIV”. This indicates a general lack of familiarity with the topic: in HIV research we always refer to the general epidemic as the HIV/AIDS epidemic (so we “fight HIV/AIDS”) and we only refer to AIDS specifically when we are referring to that specific stage of progression of the disease. This isn’t just idle political correctness: patterns of HIV and AIDS differ widely depending on the quality of notification and the use of treatment (which delays progress to AIDS), and you can’t talk about AIDS by itself because the relationship of AIDS and HIV prevalence depends highly on the nature of the health system in which the disease occurs. The way the authors describe the HIV epidemic and reponses to it suggests a lack of familiarity with the literature on HIV/AIDS.

This sloppiness continues in their description of the statistical methods. They introduce their model as follows:

Condom model

But on page 10 they say that the thetas represent “county and year dummies” and that the Tc represents “county-specific trends”. These are not dummies. A “dummy” is a variable, not a parameter, and “dummies” for these effects should be represented by an X multiplied by a theta. In fact the theta and Tc are parameters, and in any kind of rational description of a statistical model this model is written wrong. It should be written with something like ThetacXc where Xc is the dummy[1].

This kind of sloppiness really offends me about the way economists describe their models. This is a simple OLS regression of the relationship between the log of birth rate and some covariates. In epidemiology we wouldn’t even write the equation, we would just list the covariates on the right hand side. If anyone cares about the equation, it’s always the same and it’s in any first year textbook. You don’t make yourself look smart by writing out a first year sociology equation and then getting it wrong. Just say what you did!

So, with that bit of venting out of the way, let’s move on to the real problems with the article.

Another model without Poisson regression

The absolute gold standard correct method for modeling birth rates is a Poisson regression. In this type of equation we model counts of births directly, and incorporate the population as an offset. This is a special case of a generalized linear model, and it has a special property that OLS regression does not have: the variance of the response is directly related to the magnitude of the response. This is important because it means that the uncertainty associated with counties with small numbers of births is not affected by the counties with large numbers of births – this doesn’t happen with OLS regression. Another important aspect of Poisson regression is that it allows us to incorporate data points with zero births – zero rates are possible.

In contrast the authors chose to use an OLS regression of the log of the birth rate. This means that there is a single common variance across all the observations, regardless of their actual number of births, which is inconsistent with the behavior of actual events. It also means that any counties with zero births are dropped from the model, since they have no log value. It also means that there is a direct linear relationship between the covariates on the right hand side of the model and the outcome, whereas in the Poisson regression model this relationship is logarithmic. That’s very important for modulating the magnitude of effects.

The model is, in fact, completely inappropriate to the problem. It will give the wrong results wherever there are rare events, like teenage births, or wherever there are big differences in scale in the data – like, say, between US counties.

Obscuring trends with a strange transformation

I mentioned above that the article also uses the ratio of teen to adult births (in age groups 20-24) to explore the effect of condom use. Figure 1 shows the chart they used to depict this.

Figure 1: The weird condom diagram

Figure 1: The weird condom diagram


Note that the time axis is in years before and after implementation of the program. This is a highly deceptive figure, because the schools introduced condom programs over 4 years, from 1989 to 1993. This means that year 0 for one school district is 1989, while for another it is 1992. If teen births are increasing over this period, or adult births are decreasing, then the numbers at year 0 will be rates from four different years merged together. This figure is the mean, so it means that four years’ worth of data are being averaged in a graph that only covers ten years’ worth of data. That step at year 0 should actually occur across four different points in time, within a specific time trend of its own, and can’t be simplified into this one diagram.

Note that the authors only show this chart for the schools that introduced a condom program. Why not put a similar line, perhaps in a different color, for school districts that didn’t? I suspect this is because the graph would contradict the findings of the model – because either the graph is misrepresentative of the true data, or the model is wrong, or both.

This graph also makes clear another problem with this research: the authors obviously don’t know how to handle the natural experiment they’re conducting, since they don’t know how to represent the diverse start points of the intervention, or the control group.

Lack of a difference in difference model

The authors include a term for the effect of introducing condom distribution programs, but they don’t investigate whether there was a common effect across condom distribution and non-condom distribution regions. It’s entirely possible that school districts without condom distribution programs also saw an increase in teen pregnancies (1989 is when MTV came out, after all, and all America went sex crazy. It’s also the year of Like a Prayer, and Prince’s song Cream was introduced in 1991. Big things were happening in teen sexuality in this period, and it’s possible these big things were way bigger than the effect of government programs.

Statistics is equal to any challenge, though[2]. We have a statistical technique for handling the effect of Miss Calendar grooving on a wire fence. A difference-in-difference model would enable us to identify whether there was a common effect during the intervention period, and the additional effect of condom promotion programs during this period. Difference-in-difference models are trivial to fit and interpret, although they involve an interaction term that is annoying for beginners, and they make a huge difference to the interpretation of policy interventions – usually in the direction of deciding the intervention made no difference. Unfortunately the authors didn’t do this, so we see that there was a step change in the intervention group, but we don’t see if there might have been a similar step change in the control group. This effect is exacerbated by having county-specific time trends, since it better enables the model to adapt to the step in the control group through adaptively changing these county-specific trends. This means we don’t know from the model if the effect in the intervention group was really confined to the intervention group, and how big it really was.

The correct model

The correct model for this problem is a Poisson regression modeling teen births directly with population as an offset, to properly capture the way rates change. It would be a difference-in-difference model that enables the effect of the condom programs to be extracted from any general upward or downward steps happening at that time. In this model, figure 1 would be replaced by a spaghetti plot of all the counties, or mean curves for intervention and control not rescaled to ensure that the intervention happens at year 0 for all intervention counties, which is misleading. Without doing this, we simply have no evidence that the condom distribution programs did what the authors claimed. The ideal model would also have a further term identifying whether a condom program did or didn’t include counselling, to ensure that the authors have evidence for their claim that the programs with counselling worked better than those without.

I’m partial to the view expressed that counselling is necessary to make condom programs work, but Vox themselves have presented conflicting evidence that teenagers are perfectly capable of using condoms. Given this, explicitly investigating this would have provided useful policy insights. Instead the authors have piled speculation on top of a weak and poorly-designed statistical model. The result is a controversial finding that they support only through very poor statistical modeling.

The correct model wouldn’t have been hard to implement – it’s a standard part of R, Stata, SPSS and SAS, so it’s unlikely the authors couldn’t have done it. It seems to me that this poor model (and the previous one) are indicative of a poor level of statistics and research design teaching in economics, and a lack of respect for the full diversity of statistical models available to the modern researcher. Indeed, I have a Stata textbook on econometrics that is entirely OLS regression – it doesn’t mention generalized linear models, even though these are a strong point of Stata. I think this indicates a fundamental weakness in economics and econometrics, and leads me to this simple bit of advice about models of health and social behavior prepared by economists: they’re probably wrong, and you shouldn’t trust them.

I hope I’m wrong, and Vox don’t keep vexing me with “explainers” about research that is clearly wrong. I don’t hold out much hope …

fn1: for those digging this far, or who often stumble across this horrible term in papers they read, a “dummy” is just a variable that is either 0 or 1, where 1 corresponds to the event of interest and 0 to not the event of interest. In epidemiology we would just say “we included sex in the model”. In economics they say “we included a dummy for sex.” This is just unnecessary jargon.

fn2: Except the challenge to be fun.