Patterns from 110 experiments across 70 countries.

The registry is a collection of facts. This page is an attempt at synthesis: what mechanisms replicate, where the evidence is null or mixed, and what design principles emerge when you look across 60 years of civic experimentation.

91

positive results

19

null or mixed results

70

countries

60+

years of evidence

What the evidence shows.

Organized by mechanism, not policy area. The same pattern — friction reduction, social norms, defaults — appears across health, tax, education, and environmental domains. The mechanism transfers more reliably than the specific finding.

01

Simplification

13 experiments in registry

Administrative simplification often outperforms outreach.

When take-up of a beneficial program is below expectations, the bottleneck is usually the form, not awareness. Reducing the cost of compliance—through one-click enrollment, flexible scheduling, or pre-filled applications—consistently outperforms information campaigns aimed at people who already know about the program.

Implication

Before launching a new outreach campaign, audit the enrollment or compliance process itself. A one-step form consistently outperforms a well-designed awareness campaign.

02

Default

6 experiments in registry

Defaults determine most outcomes for low-salience decisions.

When a decision is infrequent, low-urgency, or requires effort to act, the starting position determines the vast majority of outcomes. Opt-out defaults reliably outperform opt-in by ratios of 3:1 to 10:1. The mechanism is not persuasion — it is inertia.

Implication

The most effective intervention in many contexts is redesigning what happens if someone does nothing. Ask: what is the default behavior we are competing against?

03

Social norms

5 experiments in registry

Social norms work for tax and resource use — less so for deep hesitancy.

Showing people that similar peers comply with a norm reliably increases compliance for frequent, private behaviors like tax payment and energy use. For behaviors driven by deeper barriers — medical hesitancy, structural poverty — social norms have small or negligible effects at population scale.

Implication

Before choosing social norms as a mechanism, diagnose the actual barrier. If the behavior is private and frequent and the barrier is inattention, norms work. If the barrier is trust, access, or deep resistance, norms are insufficient.

04

Personalization

4 experiments in registry

Personalization reliably outperforms generic communication.

Using a name, a specific dollar amount, or a locally relevant frame consistently improves response rates over generic outreach. The effect is not trivial: in the UK court fine experiment, a single personalized name increased payment 189%. The mechanism is accountability and relevance — generic messages are processed as noise.

Implication

Personalization costs little at scale but reliably improves outcomes. Specific messages feel like a direct claim on the recipient; generic ones do not.

05

Targeting

6 experiments in registry

Concentrated interventions outperform diffuse ones in public safety.

The Kansas City null result and the Philadelphia positive results are not contradictory — they reveal a specificity effect. Police presence concentrated at the 3–5% of locations generating most crime produces significant reductions. Spread thinly across a precinct, the same resources have no measurable effect. The unit of analysis matters enormously.

Implication

Evaluate targeting before evaluating the intervention. An approach that fails at the precinct level may succeed at the street-segment level.

06

Human capital

20 experiments in registry

Early childhood investments produce the highest long-term returns.

Three landmark experiments found that high-quality early intervention produces dramatic improvements in adult outcomes: employment, earnings, education, crime, and health. The returns compound over decades and are largest for the most disadvantaged children. They are systematically underestimated by short-term evaluations.

Implication

The political challenge of early childhood investment is that the most persuasive evidence arrives 20–40 years after the program. This is an argument for long-term commitments with pre-specified follow-up plans — not for discounting the investment.

07

Price signal

10 experiments in registry

Free distribution outperforms subsidized pricing for health products.

A persistent assumption is that charging a small price for a health product increases commitment and reduces waste. The empirical evidence does not support this for most products. Price is a take-up barrier. Once obtained, usage rates do not differ by price paid.

Implication

When designing distribution programs for health products or benefits, assume that price is an access barrier and that recipients will use what they receive. Design for access, not for filtering through price.

08

Cash transfer

11 experiments in registry

Cash transfers work — and recipients spend them wisely.

Across multiple continents and economic contexts, both conditional and unconditional cash transfers have produced significant, sustained improvements in consumption, assets, health, and education. Concerns about wasteful spending on alcohol or tobacco have not been borne out by experimental evidence.

Implication

The political reluctance to give poor people cash is not supported by experimental evidence. Where the question is 'what should we provide?' — cash consistently outperforms most in-kind alternatives in giving recipients agency over their own priorities.

09

Human capital

20 experiments in registry

Gains fade when recipients return to unchanged environments.

Several high-quality experiments found that program effects diminished after the intervention ended — not because the programs failed, but because the environments participants returned to hadn't changed. Programs that permanently change circumstances produce durable effects. Programs that only temporarily subsidize behavior in an unchanged environment tend not to.

Implication

When evaluating fade-out, ask whether the program changed the participant's environment or only the participant. This distinction shapes whether additional services, follow-on programs, or environmental change are the right response.

10

Information

19 experiments in registry

Null results cluster in predictable contexts.

The null results in this registry cluster where: the intervention targets the wrong barrier, the behavior is deeply entrenched, the comparison group is already receiving high-quality treatment, or follow-up is too short. A null result is often a diagnosis of mismatch between mechanism and barrier — not proof that the intervention category fails.

Implication

Before abandoning a tool after a null result, investigate whether it was deployed in a context where its mechanism could plausibly work. The Kansas City, DC camera, and Rhode Island vaccination results all failed for diagnosable reasons.

11

Social norms

5 experiments in registry

Social accountability produces the largest civic behavior effects — and the hardest design questions.

The Michigan social pressure mailing produced an 8.1 percentage point increase in voter turnout — the largest effect on civic behavior ever measured in a randomized study. Tax norm letters and energy comparison reports show consistent 2–10 point effects. The mechanism is not persuasion or information: it is social surveillance. Making behavior visible to peers activates compliance more powerfully than appeals to duty, incentives, or accurate information. The size of the effect is proportional to the intensity of accountability. And the mechanism raises a question that other interventions don't: is it appropriate to use social pressure to produce civic compliance?

Implication

Social accountability tools are among the most effective in civic behavioral science — and among the most ethically contentious. Before deploying them, ask whether the mechanism (visibility, potential embarrassment) is consistent with the civic culture you are trying to build. Effectiveness and appropriateness are separate questions.

12

Human capital

20 experiments in registry

Training mechanism matters more than training duration.

The JTPA evaluation — one of the largest RCTs in US social policy history — found no effect on earnings for adult men and negative short-term effects for youth, despite substantial investment. Year Up, evaluated in a similarly rigorous lottery-based RCT, found a 30% earnings increase sustained over five years. The interventions both involved job training. They produced opposite results. The difference is mechanism: JTPA delivered generic skills training disconnected from specific employer demand. Year Up built curriculum around real job openings, placed participants in real internships, and measured employer satisfaction alongside participant outcomes. Sectoral training — connected to actual labor market demand — produces large sustained effects. Generic training typically does not.

Implication

Evaluate 'job training' programs by their mechanism, not their category. The relevant question is not 'is this a training program?' but 'does this training connect participants to documented employer demand for specific skills in a specific local labor market?' Without that connection, expect null results.

13

Commitment device

2 experiments in registry

Commitment devices overcome present bias more reliably than information or incentives.

Save More Tomorrow (SMarT) increased retirement savings rates from 3.5% to 13.6% over four years — not by increasing workers' motivation to save, but by removing the moment of decision. Workers committed in advance to direct future pay raises toward savings, so they never experienced a current-income reduction. The mechanism — pre-commitment to a future behavior — is more powerful than financial education (which produces knowledge without behavior change) and comparable-cost incentives. The same pattern appears in Kenya's commitment savings accounts: access to a locked savings product increased savings by 66% and agricultural investment by 44%, despite no change in interest rates.

Implication

When the barrier is present bias — people want to do something in the future but consistently fail to act in the present — information and incentives will underperform. Design for commitment: identify a low-friction moment when the future behavior can be pre-authorized, and build the default around that commitment.

Six principles that emerge from the evidence.

Not rules derived from theory. Patterns observed across repeated experiments in varied contexts.

01

Diagnose the barrier before choosing the mechanism.

The most common cause of null results is a mismatch between the mechanism of the intervention and the actual barrier. Is the barrier information? Friction? Trust? Incentive? Resources? Each requires a different solution.

02

Measure equity effects — not just average effects.

The RAND HIE showed no average health effect of free care, but significant benefits for poor and sick patients. Policies designed from average effects routinely harm their most vulnerable targets. Pre-specify subgroup analyses.

03

Short-term evaluations underestimate early-stage investments.

Perry Preschool, NFP, and the Jamaica study required decades to reveal their full effects. Programs evaluated at 1–2 years systematically understate ROI for human capital interventions. Build long-term follow-up in from day one.

04

Replication is the standard, not the exception.

The most consequential findings here — NFP, PROGRESA, Opower, hot spots policing — achieved policy influence through replication across contexts. A single positive result is a hypothesis. Three consistent replications in varied settings are evidence.

05

Publish null and mixed results.

The Kansas City patrol null, the DC body camera null, and the Head Start fade-out are as important as any positive finding. Hidden null results cause communities to repeat failed approaches. Published null results prevent that waste.

06

Start with the reversible pilot.

PROGRESA's village randomization and Oregon Medicaid's lottery both produced rigorous evidence because researchers preserved uncertainty. Policymakers who demand certainty before acting foreclose the possibility of learning at all.

What this synthesis cannot tell you.

Generalizing from experiments is itself a form of inference. Every finding was produced in a specific context, population, and time. The patterns here are working hypotheses, not universal laws.

Will this work in my community?

External validity is the central challenge. The patterns here are more reliable when the mechanism is consistent with local context. Ask whether the barrier is the same, not just whether the population looks similar.

Which effect size should I expect?

Effect sizes vary widely even within replications. Opower's 2% energy reduction replicates across 100+ utilities but ranges from <1% to 4% by site. Use pooled estimates as planning assumptions, not guarantees.

What about interactions between interventions?

Most experiments test one mechanism in isolation. The BRAC graduation program works because it bundles assets, training, cash, health, and coaching — but the experiment cannot tell us which components drive results or how they interact.

Does scale change the result?

General equilibrium effects — what happens when a program reaches everyone — are rarely captured. A job training program that works for individuals may not raise wages if everyone receives it simultaneously.

The experiments that changed policy — PROGRESA, Perry Preschool, Nurse-Family Partnership, Opower — were not exceptional in their ambition. They were exceptional in their willingness to be measured honestly and to wait for the answer.

The goal of The Experiment Society is not to replicate these landmark studies. It is to make the practice of honest measurement ordinary — one library, one permit office, one parks department at a time.

Browse the registry →Start a pilot