The two roads towards external validity

I just had a look at this piece written by Cyrus Samii (HT gonzalo) on how to address the problem of the external validity of experiments. I find his argument exciting and insightful, yet, I remain unconvinced. Since I have been dealing with these topics now for a while, I will (freely) reconstruct what I understand he is arguing and an explanation of why I don’t fully agree with it, mostly for the sake of discussion. I believe it is fair to say that the standard view (SV) among experimentalists has two common traits:

  1. It is agnostic about the theoretical underpinnings of the specific causal mechanism that links the treatment and the outcome.
  2. The problem of external validity (how conclusions from an experimental setting can be extrapolated to a different context) can be overcome or at least partially remedied repeating the same experiment across different contexts.

CS on the hand argues that the main problem of external validity does not have to do with the fact that contexts vary, but with the fact experiments are never exactly the same. Indeed, this view is not very different to that endorsed by Duflo (pg 21). How do yo characterize a certain treatment is always “theory dependent”. We can consider unemployment insurance is the same treatment everywhere and varies only to the extent to which it affects reservation wages, so it is possible to measure its generosity. Yet, unemployment insurance may have other dimensions or interact with many other institutions that affect job search incentives (such as ALMP’s). CS argues that treatments should be seen as “bundles” and, instead, the main problem of approaching external validity has to do with “unbundling” those characteristics that are at the heart of the mechanism de produces the effect. So, CS’ views is different in that:

  1. It is sympathetic to testing theoretical tenets, meaning certain trait of a treatment and see what role the play. He actually suggests that approaching that theoretical motivations should lead the experimentalist agenda.
  2. The quest for external validity can be more usefully achieved testing causality across treatments instead of across contexts.

I started saying I’m sympathetic to his view. Here is why. As I said in my previous piece , I understand causal inference as the task of untangling the cause, the effect and the mechanism that links them and, one of the reason why I don’t feel satisfied with reduced form econometrics has to do with the fact that it does not identify the latter, which is just thrown inside a black box. I think this is unsatisfactory because, when you ignore the nuts and bolts behind the mechanism, there is no systematic way to approach generalization -i.e. external validity- since you can not discuss the plausibility of how robust the parameters will be. What I like about CS’ is that the black box is opened, so it looks like a step in the right direction. I also said, however, that I remain unconvinced. Let me introduce some notation for this.. Consider the following (wig linear) structural equation:

y =α+(β +π′z)x +ε


  • y is some variable to be explained
  • x = [x_1,…,x_K] is a vector that fully characterizes the treatment.
  • z = [z_1,…,z_K] is a vector that fully characterizes the context and that may have both observable and unobservable elements.
  • ε is the error term, that is, all other unobservables that are independently distributed.

and let (β +π′z) = θ so that we can re-write the equation as:

y =α+θ’x +ε

Running an experiment allows you to compare the outcome when x = 0 to a certain treatment, call it “treatment (a)“: x(a) = [x(a)_1,…,x(a)_K] and identify θ .  This is (a face of) the problem of external validity (full discussion can be found on pg 66 of this paper). The SV is that we can fix this problem by trying the same x (typically x is likely to be considered one-dimensional) across different contexts/values of z. Hopefully, replication will give a sense of how different contexts influence the outcome. This idea is endorsed by Duflo for example when she says:

That does not mean that it [the effect] would be true in India, but the very fact that there is this possibility means that we want to investigate this question more. And we can try a similar experiment elsewhere to see in what conditions this will reproduce.

CS makes a very good point when he argues that the specific effect of each x(a)_i that characterizes the treatment on the outcome remains unidentified: you can only identify de joint effect of the whole vector.  Eventually, running the same experiment shifting specific parameters of the treatment (x(a)_i) you could get a sense of what is at work. Indeed, when the treatment is not binary, typically it is possible to asses this -although how the treatment is characterized may restrict whether some expressions of x(a) are observed or not if the effect are not fully linear. Once we get a good sense of the mechanics of the causal mechanism, we can proceed to discuss the plausibility of the those mechanics in other context in a more sophisticated way. Yet, the primary problem of external validity that replication across different contexts is meant to solve, remains unchanged. This issue has to do with the fact that only θ = (β +π′z) is identified but β  and π are not because we have only one observation of z.

What is the way ahead? Duflo suggests for instance that every combination of (z,x) that is tested can be seen as a data point and, together with theory -or structural approaches to experimentation- and careful examination of individual cases, can paint a more complete view of the world. I sympathizes with this view; yet, I have the feeling that RCT’s look much less straightforwardly superior, particularly when their cost is taken into account, when we look at them in this light.

