Issue 10·August 7, 2027·10 min read
Donald CampbellEvaluation theoryIntellectual history

Campbell's vision.

The psychologist who invented modern policy evaluation — and why his idea of the Experimenting Society is still ahead of its time.

In 1969, a psychologist named Donald T. Campbell published a paper in the American Psychologist titled "Reforms as Experiments." It was not a study of any particular program. It was an argument about what governments should become.

The occasion was a policy experiment that had already been run — an unusual case where a governor had implemented a traffic safety measure as a genuine test. Campbell used it as a launching point for a larger claim. He argued that most government policy was implemented without any serious attempt to test whether it worked. Programs were announced, funded, and institutionalized. Evaluation — when it happened at all — came after the fact, was done by advocates, and was designed to confirm success rather than discover truth.

Campbell wanted something different. He wanted governments that treated every major reform as an experiment, designed from the beginning to generate reliable knowledge about whether it worked.

He called this "the experimenting society."

The threats to validity

Before Campbell could articulate his vision, he needed a vocabulary. The problem with most policy evaluation was not that evaluators lacked data — it was that they drew causal conclusions from data that couldn't support them.

Campbell, working with Julian Stanley, catalogued the threats to validity that made such conclusions unreliable. *History* — other things changed at the same time as the program. *Maturation* — people naturally change over time regardless of intervention. *Regression to the mean* — programs targeting high-need populations will appear effective even if they do nothing, because extreme conditions naturally improve. *Selection bias* — people who choose to participate in a program differ systematically from those who don't.

Campbell and Stanley's 1963 book *Experimental and Quasi-Experimental Designs for Research* laid out these threats systematically and proposed research designs that could address them. Randomization eliminates selection bias. Interrupted time series addresses history and maturation. Regression discontinuity exploits the arbitrary nature of eligibility cutoffs.

The book was written for psychologists studying small-scale laboratory phenomena. What Campbell recognized was that the same logic applied to policy — and that the designs could be adapted to the messy conditions of real government programs. A policy that phases in across geographic areas, or that uses a lottery to allocate limited slots, or that has a hard eligibility cutoff — each of these creates a natural comparison group that makes valid evaluation possible.

Reforms as experiments

The 1969 paper extended this logic to a normative argument. If governments were going to implement reforms anyway — and they were — they should implement them in ways that made valid evaluation possible.

This required a shift in how governments thought about their own uncertainty. Campbell's experimenting society was not one that paralyzed itself with doubt. It was one that acknowledged, honestly, that it did not know whether the reforms it was implementing would work — and that it structured its operations to find out.

The practical implications were significant. Programs should be phased in randomly rather than simultaneously, creating comparison groups. Outcomes should be measured before and after, not just after. Data collection systems should be designed for evaluation, not just administration. And — crucially — null results should be published and learned from, not buried.

Campbell was explicit about the political obstacles. Governments announced programs to claim credit. Admitting that a program had failed was politically costly. The incentive was always to hype the results, select favorable metrics, and report success. An honest experimenting society required politicians who could tolerate uncertainty and admit failure — which was, he acknowledged, asking a great deal.

What Campbell got right

The tools Campbell and his colleagues developed — the threat taxonomy, the quasi-experimental designs, the logic of comparison groups — became the foundation of modern impact evaluation. Every randomized controlled trial of a social program, every regression discontinuity study of a school voucher, every interrupted time series of a health intervention is an application of the framework Campbell spent his career building.

The methodological achievement is hard to overstate. Before Campbell, policy evaluation was largely correlational: programs that served poor communities produced reports showing improvement in those communities, which proved nothing. After Campbell, evaluators had a vocabulary for the question "what would have happened without the program?" and a toolkit for answering it.

The Institute for Education Sciences in the US requires experimental or quasi-experimental evidence for its highest ratings. The What Works Clearinghouse uses Campbell's threat taxonomy, explicitly, as its evidentiary standard. The J-PAL network runs thousands of randomized evaluations a year using designs traceable directly to his work. The intellectual lineage is clear and the debt is real.

What Campbell underestimated

What has not been built — what remains largely aspirational fifty years later — is the institutional structure Campbell called the experimenting society.

Most governments still do not routinely pre-specify outcomes before implementing programs. Most programs are still evaluated after the fact, by advocates, using self-selected metrics. Most null results are still never published. The political incentives Campbell identified have not changed.

The gap between the methodological achievement and the institutional vision points to something Campbell may have underestimated: the problem of evaluation is not, at its core, a technical problem. It is a political one. The knowledge production that good evaluation requires is in tension with the incentive structures of democratic governance.

Elected officials face short time horizons, competitive reelection pressures, and constituencies who want to believe programs work. A bureaucracy that honestly reports that its flagship program produced no measurable improvement is not rewarded. A commissioner who recommends abandoning an ineffective program — even one that is genuinely ineffective — faces political risk that a commissioner who defends it does not.

Campbell knew this. He wrote about it. But his solution — appeal to the norms of science, build institutions that value truth-seeking — underweighted how durable and self-reinforcing the political incentives were.

The incomplete project

The Experiment Society exists in the space between what Campbell built and what he imagined.

The technical toolkit is in place. The vocabulary of causal inference is now taught in graduate programs in public policy, economics, political science, and education. The designs that can generate valid evidence about whether civic interventions work are well understood.

What remains scarce is the institutional scaffolding — the norms, the incentives, the organizational structures — that would make local governments into genuine learning institutions. The question is not whether it is possible to evaluate civic interventions well. It clearly is. The question is whether governments will organize themselves to do it.

Campbell's vision was that enough demonstrations of what worked — enough honest records of what didn't — would eventually create pressure for that organizational change. That demonstrating the value of knowing was sufficient to produce the will to know.

Whether he was right about that is still, in the most literal sense, an open question.

*Donald Campbell died in 1996. His collected papers are held at Lehigh University. "Reforms as Experiments" remains one of the most cited papers in all of social science.*