On data, experiments, incentives and highly unconvincing research

When I read abstracts of papers, I have detected that two starters discourage me from keeping reading: a) Those that “present a novel data set” and b) Those that present an experiment or quasi experiment as its originality.

My first impulse was that of thinking that I’m biased against empiricist for academic tribal reasons. But then I tried to remember what had happened in the cases in which I had kept reading: in general, my initial reaction had rarely been disproved.

I discussed this the other day with my crazy empiricist office mate. My problem with the “credibility revolution” in applied social science, which I prefer to call “the data centric obsession”, is not just that I find Rubin identification an incomplete account of causality. My biggest beef has to do with the academic incentives it induces.

Consider the emphasis on data. For a PhD student, one way of finding a dissertation topic seems too often to be the following: 1) Find a topic where questions face measurement issues due to lack of public data 2) Spend a large part of your PhD source mining and “collecting data” 3) Construct an index or a data base 4) Push the Stata button until you get the right amount of stars and claim that you have provided an improve answer 5) Brand yourself in the market as an expert in the field. Naturally, there is nothing wrong abstractly speaking with this. Data collection is a fundamental part of modern research and someone ought to do it. The problem is that it induces bad incentives: it means that you can go to the market and get published doing what is essentially a form of extended RA’ing. It does not provide any incentive to acquire a methodologically sophisticated or substantively rich training; probably, this will be crowded out by the time you will spend collecting data. Moreover, this is plagued with all sort of problems, since measurement and index building is usually more involved than classifying and aggregating in 1 to 4. scales and taking the average.

A similar problem arises with respect to the “experimentalist” zeitgeist. It seems to me that many research questions are increasingly motivated, not by their interest in themselves, but by the availability of some quasi-experiment. The mindset induced by Rubin identification, as detailed in the first chapter of Angrist-Pischke’s book, is to keep thinking about what kind of experiment could work to answer a certain question. The effect is however perverted as the pressure to publish induces you to take the inverse road: find “natural” experiments and then think what questions could be answered with them, which naturally results into an overabundance of work whose only interest is the cleverness of the experimental design, but whose relevance or connection with a relevant question is miles away from being clear.

A counter criticism is obviously how is it that it is possible to find your way with such lame work. Shouldn’t someone show up at a seminar pointing what I’m pointing out here? The fact is that this does not seem to happen and I don’t have anything more than a tentative answer for it.

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected.  If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

I would not like to seem too harsh. There is excellent work out there, that marries carefully crafted theoretical hypotheses with clever identification strategies and, of course, I think that data collection is absolutely fundamental. What I’m highly critical of is the naif empiricist ethos that is often at the root of “data centric” applied work. One problem is of course that your conclusions can only be as precise as your theoretical assumptions. But what I’m concerned here is the sort of academic incentives it induces. I’ve always considered myself a philosophical (and epistemological) pragmatist, and any account of a method should consider it as a norm structuring a social group- academia in this case.

On data, experiments, incentives and highly unconvincing research

One thought on “On data, experiments, incentives and highly unconvincing research

  1. I hear what you say, and am tempted to agree — esp. on the experimental fad — but coming from the more “qualitative” and China-studies end of the field I have almost the opposite beef, namely that people seem to dig far too little into the data (their’s, others’, and all the uncollected stuff still out there). This goes for both stats workers and qual-types. Instead, the focus is unrelentingly on “theory building”, which in practice all too often seems to mean “tag some data (numbers, interviews, more rarely documents) to a pre-existing political-science debate; don’t think too much about whether this debate is even relevant to the data or whether the data really means what you say it means — you are unlikely to encounter anyone who knows the stuff well enough to challenge you on the substantive points, and if even you do they won’t, because that is just not done, and no one else in the seminar would be interested.” Just as no one goes through the proofs of a formal paper.

    Now China might be a special case (and I’m thinking especially of China-related work as I write this), simply because so much data is so messy and so complex (e.g. deliberately-misleading statistical categories, and that’s just scratching the surface), and getting hold of really useful material and figuring out what it really means is often so hard and requires a very high level of background knowledge, which takes years to acquire. Still, thinking of papers and books that I’ve been seeing in the past few years I’m struck by the amount of papers that 1) rest on crazy assumptions that are demonstrably wrong, 2) use data without inquiring into the relevant policy & regulatory context (which is not actually hard, it just takes a lot of work. But all one has to do is work one’s way through the relevant documents), 3) use data without bothering to inquire into the idiosyncratic, China-specific structures possibly contained in the data.

    Political science, I’ve come to feel, is often more interested in its own internal debates and priorities than in figuring out what is really going on in the world.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s