I spent last week at a workshop in Chicago on the topic of causal inference. It was really a great experience and I learned a lot, especially in terms of putting my ideas in order. There is, however, something that constantly made me feel unease.
Standard errors were one of the big topics about which people talked about. People have strong discussions about standard errors. Josh Angrist, for example, suggested that one good reason to prefer regression to matching strategies is that standard errors are easier to obtain and interpret. In the frequentist worldview that is standard in the causal inference environment, this makes a lot of sense: you definitely want to know if the effects you’re estimating are just noise created out of small sample size or something else. That is, you want to have some sense of how the uncertainty derived from the sampling process affects your estimates.
Why are we interested in uncertainty? The way I was first taught about statistical inference was as a statistical decision problem: a game played between the statistician and nature. This is true either under a bayesian or a frequentist paradigm: only the kind of risk you try to minimize changes (informed or uniformed by a prior). Coming from the field of economics, it was pretty reasonable to think of the mean as an estimator derived under a quadratic loss function. Nonetheless, there should not be anything special about quadratic loss functions, and this is something pretty clear. Under absolute risk, for instance, the median is the optimal estimator. In this decision theory framework, uncertainty makes a whole lot of sense because you want to understand how much you should trust your estimation.
From the point of view of advancement science, and particularly from a bayesian point of view, it seemed very relevant to me to evaluate discovery by a) How big and b) How certain we can consider them. This is even more important in the case of causal inference which is framed explicitly in the language of policy and treatment analysis: the textbook case of the potential outcomes framework is the analysis of a drug or a policy. If you want to evaluate a labor market program, you want to know if it is worth the cost and that means we should be able to present the policy maker with an estimate of the effect and how sure we are about it.
Although I would like to see all this talk being about posterior distribution of parameters, certainly wondering about statistical inference is good. Yet, what left me scratching my head was the total absence of discussion of the uncertainty coming from identification assumptions- i.e. sensitivity analysis. The discussion happens in two phases: first, there is some totally informal discussion of the assumption, then conditional on the assumptions being true the estimation occurs. Conclusions eventually follow from this last step, often recommending some treatment over another.
In practice, identification assumptions are not either believed or not. In practice, the truth is probably in the middle. Think of the popular RDD of using close elections to estimate the effect of holding office. For the identification strategy to be valid, we need close losers and close winners to be similar in every other respect. Yet, as has been pointed out as a critique, it may be the case that close losers systematically lose in close elections because close winner can manipulate the results (for example, they have some kind of connection with other politicians) and are thus not comparable (this may matter if you try to estimate the returns of holding office and you think career politicians tend to win and businessmen tend to lose: their performance in office and private sector are not comparable). These are two extremes cases: valid or invalid.
Should we choose? No! Probably the truth is somewhere in the middle, and that’s how we should estimate. In particular, we probably can formulate our believes in terms of probability distribution encompassing all intermediate outcomes. From this probability distribution, we probably will derive a certain measure of uncertainty that should be combined with the uncertainty derived from the estimation to understand the final estimate. Discussing this jointly is particularly important since, in practice, there is a tradeoff between the credibility and the significance of inference: assuming more decreases credibility but affects effective sample size.
Charles Manski has spent a large part of his career recommending this approach: bounds are identification sets should be derived first, and these should be progressive reduced by making every stronger assumptions, thus epitomizing the tradeoff between credibility. Coming back from the workshop, I re-read a JEP piece some years ago by Ed Leamer in which he makes a case very close to mine:
The range of models economists are willing to explore creates ambiguity in the inferences that can properly be drawn from our data, and I have been uity in the inferences that can properly be drawn from our data, and I have been recommending mathematical methods of sensitivity analysis that are intended determine the limits of that ambiguity.
[extreme bound analysis]It is a solution to a clearly and precisely defined sensitivity question, which is to determine the range of estimates that the data could support given a precisely defined range of assumptions about the prior distribution. It’s a correspondence between the assumption space and the estimation space. Incidentally, if you could Incidentally, if you could see the wisdom in finding the range of estimates that the data allow, I would work to provide tools that identify the range of o provide tools that identify the range of t-values, a more important measure of the fragility of the inferences.
What is being said here about identification assumptions seems to me to be equally valid to other phases of the data analysis process, in particular to preprocessing. Preprocessing involves assumptions and choices and the uncertainty coming from them is typically ignored.
The bottom line is that if we look at data analysis from the perspective of a decision problem, every single step involves a choice, and every choice involves uncertainty that is relevant. When we look at causal inference, this is particularly true, since we are trying to choose between alternative scientifically grounded relationships. This uncertainty should be somehow quantifiable, and reflected, as an interval or a distribution, in the