What Works

Patterns from 158 experiments across 28 countries.

The registry is a collection of facts. This page is an attempt at synthesis: what mechanisms replicate, where the evidence is null or mixed, and what design principles emerge when you look across 60 years of civic experimentation.

121

positive results

null or mixed results

countries

60+

years of evidence

Sixteen Patterns

What the evidence shows.

Organized by mechanism, not policy area. The same pattern — friction reduction, social norms, defaults — appears across health, tax, education, and environmental domains. The mechanism transfers more reliably than the specific finding.

Simplification

14 experiments in registry

Administrative simplification often outperforms outreach.

When take-up of a beneficial program is below expectations, the bottleneck is usually the form, not awareness. Reducing the cost of compliance—through one-click enrollment, flexible scheduling, or pre-filled applications—consistently outperforms information campaigns aimed at people who already know about the program.

SNAP Enrollment Information Letters for Seniors

positive

Control: 6% → Information only: 11% → Information + Assistance: 18%

Pennsylvania, USA · 2016

Streamlined ACA Health Insurance Enrollment

positive

Streamlined: +11 pp; Personalized reminder: +7.9 pp; Generic reminder: +4.5 pp vs. control

Massachusetts, USA · 2021

Flexible SNAP Interview Scheduling

positive

+6 pp approval rate; doubles early approvals; +2 pp long-term participation

Los Angeles, USA · 2022

Online Permit System Implementation

positive

+214% permits issued in first quarter post-launch; revenue increased to $140,000

Boone County, USA · 2021

Implication

Before launching a new outreach campaign, audit the enrollment or compliance process itself. A one-step form consistently outperforms a well-designed awareness campaign.

Default

11 experiments in registry

Defaults determine most outcomes for low-salience decisions.

When a decision is infrequent, low-urgency, or requires effort to act, the starting position determines the vast majority of outcomes. Opt-out defaults reliably outperform opt-in by ratios of 3:1 to 10:1. The mechanism is not persuasion — it is inertia.

Green Electricity Opt-Out Default

positive

Opt-out adoption: 68–94% vs. opt-in: 7–41% (roughly 10× difference)

Germany · 2016

5p Plastic Bag Charge

positive

−95% single-use plastic bags distributed by major supermarkets between 2015 and 2021; from 7.6 billion bags/year to <300 million

England, UK · 2015

Inclusionary Zoning Affordability Audit

positive

Affordable units produced: +1,200 units above baseline; total production unchanged relative to synthetic control

Chicago, IL, United States · 2018

Green Investment Default in Retirement Plans

positive

95%+ of enrolled workers now in responsible investment default; opt-out rate <2%; comparable returns

United Kingdom · 2020

Implication

The most effective intervention in many contexts is redesigning what happens if someone does nothing. Ask: what is the default behavior we are competing against?

Social norms

5 experiments in registry

Social norms work for tax and resource use — less so for deep hesitancy.

Showing people that similar peers comply with a norm reliably increases compliance for frequent, private behaviors like tax payment and energy use. For behaviors driven by deeper barriers — medical hesitancy, structural poverty — social norms have small or negligible effects at population scale.

Social Norm Tax Letters

positive

Treated firms paid $184 million more than controls (0.22% of GDP)

Dominican Republic · 2019

Opower Home Energy Reports

positive

−2.0% electricity consumption; effect equivalent to 11–20% temporary price increase

United States (multi-site) · 2012

Social Comparison Home Water Reports

positive

−5% water consumption; effects largest among high-usage households

United States (multi-site) · 2014

COVID-19 Vaccination SMS Reminders

null

Largest effect: +0.2 pp (2.0% control vs. 2.2% best treatment arm) — statistically significant but practically small

Rhode Island, USA · 2021

Implication

Before choosing social norms as a mechanism, diagnose the actual barrier. If the behavior is private and frequent and the barrier is inattention, norms work. If the barrier is trust, access, or deep resistance, norms are insufficient.

Personalization

4 experiments in registry

Personalization reliably outperforms generic communication.

Using a name, a specific dollar amount, or a locally relevant frame consistently improves response rates over generic outreach. The effect is not trivial: in the UK court fine experiment, a single personalized name increased payment 189%. The mechanism is accountability and relevance — generic messages are processed as noise.

Personalized SMS for Court Fine Repayment

positive

Personalized SMS with debtor's name: £12.87 average payment vs. £4.46 control (189% increase)

United Kingdom · 2012

Ownership Framing for COVID-19 Booster Uptake

positive

Ownership framing: OR = 1.28 vs. no SMS; +11% relative to generic reminder

United States · 2022

Rental License Renewal — Behavioral Messaging

positive

All treatment arms outperformed control; personal benefits framing marginally strongest for registration

Philadelphia, USA · 2020

Personalized Loan Repayment SMS Reminders

mixed

SMS mentioning loan officer's name substantially improved repayment vs. client name or control; no significant gain/loss framing effect; timing mattered less than personalization

Philippines · 2013

Implication

Personalization costs little at scale but reliably improves outcomes. Specific messages feel like a direct claim on the recipient; generic ones do not.

Targeting

10 experiments in registry

Concentrated interventions outperform diffuse ones in public safety.

The Kansas City null result and the Philadelphia positive results are not contradictory — they reveal a specificity effect. Police presence concentrated at the 3–5% of locations generating most crime produces significant reductions. Spread thinly across a precinct, the same resources have no measurable effect. The unit of analysis matters enormously.

Kansas City Preventive Patrol Experiment

null

No significant difference across conditions on any primary outcome

Kansas City, USA · 1972

Philadelphia Hot Spots Policing

positive

−15% crime incidents at treated hot spots; no evidence of displacement to adjacent areas; 'diffusion of benefits' observed (crime fell in areas immediately adjacent to treated spots)

Philadelphia, USA · 1995

Philadelphia Foot Patrol Experiment

positive

−23% violent crime in foot patrol beats vs. control beats

Philadelphia, USA · 2009

Body-Worn Camera RCT — Washington DC

null

No significant effect on use of force, complaints, arrests, or assaults; null across all pre-registered outcomes

Washington, DC, USA · 2016

Implication

Evaluate targeting before evaluating the intervention. An approach that fails at the precinct level may succeed at the street-segment level.

Human capital

36 experiments in registry

Early childhood investments produce the highest long-term returns.

Three landmark experiments found that high-quality early intervention produces dramatic improvements in adult outcomes: employment, earnings, education, crime, and health. The returns compound over decades and are largest for the most disadvantaged children. They are systematically underestimated by short-term evaluations.

Perry Preschool Project

positive

By age 40: treatment group had higher graduation rates (+23 pp), higher employment (+26 pp), higher median earnings (+42%), lower arrest rates (−22 pp); estimated benefit-cost ratio of 7–12:1

Ypsilanti, Michigan, USA · 1962

Nurse-Family Partnership — Elmira Trial

positive

−80% verified child abuse and neglect at age 2; −56% emergency room visits; at 15-year follow-up: −48% child arrests, −59% maternal arrests, −83% convictions for low-income unmarried mothers

Elmira, New York, USA · 1977

Jamaica Stunting Study — Stimulation + Nutrition

positive

Stimulation arm: +42% adult earnings compared to control; fully closed the earnings gap with non-stunted peers after 20 years; supplementation alone had no significant long-term effect

Kingston, Jamaica · 1986

Head Start Impact Study

mixed

Age 3 cohort: meaningful reading/literacy gains by end of program year; by 1st grade: no significant cognitive differences; 3rd grade follow-up: minimal lasting impacts; some positive health and parenting effects persisted

United States (nationally representative) · 2002

Implication

The political challenge of early childhood investment is that the most persuasive evidence arrives 20–40 years after the program. This is an argument for long-term commitments with pre-specified follow-up plans — not for discounting the investment.

Price signal

14 experiments in registry

Free distribution outperforms subsidized pricing for health products.

A persistent assumption is that charging a small price for a health product increases commitment and reduces waste. The empirical evidence does not support this for most products. Price is a take-up barrier. Once obtained, usage rates do not differ by price paid.

RAND Health Insurance Experiment

mixed

Free care group used 30% more services than 95% co-insurance group; no significant health outcome differences for average participants; poor and sick patients on free care had meaningfully better blood pressure and vision outcomes

United States (6 sites) · 1974

Oregon Medicaid Lottery

mixed

ER use: +40%; doctor visits: +35%; financial hardship: −25% (catastrophic medical bills); depression: −30%; blood sugar, blood pressure, cholesterol: no significant improvements at 2 years

Oregon, USA · 2008

London Congestion Charge

positive

Traffic volume: −15% in charging zone in first year; journey time delays: −30%; bus reliability: significantly improved; public transport ridership: increased; NOx emissions in charging zone: −12%

London, UK · 2003

Kenya Insecticide-Treated Bednet Distribution

positive

Free distribution: 99% take-up; subsidized (10 cents): 75%; near-market price: 25%; usage among recipients did not differ significantly by price paid; no 'free makes you value it less' effect found

Western Kenya · 2003

Implication

When designing distribution programs for health products or benefits, assume that price is an access barrier and that recipients will use what they receive. Design for access, not for filtering through price.

Cash transfer

20 experiments in registry

Cash transfers work — and recipients spend them wisely.

Across multiple continents and economic contexts, both conditional and unconditional cash transfers have produced significant, sustained improvements in consumption, assets, health, and education. Concerns about wasteful spending on alcohol or tobacco have not been borne out by experimental evidence.

PROGRESA Conditional Cash Transfer

positive

School enrollment: +3.4 pp secondary (girls); child illness days: −23%; consumption: +11%; child stunting: −1 cm height gain; effects larger for girls and youngest children

Rural Mexico · 1997

GiveDirectly Unconditional Cash Transfers

positive

Assets: +58%; earnings: +38%; food security: +20%; psychological well-being significantly improved; no evidence of alcohol or tobacco spending increase; effects sustained 3 years later

Western Kenya · 2011

New York City Family Rewards

mixed

Poverty rate: −11 pp during program; health insurance: +6 pp; high school course passing: +10 pp; sustained employment among those initially working: positive; effects mostly faded after payments ended

New York City, USA · 2007

BRAC Graduation Programme

positive

Consumption: +5% (pooled); assets: +16%; food security: +9%; financial inclusion: +23%; psychological well-being: improved; effects significant in 5 of 6 countries at 3-year follow-up

Bangladesh (original); replicated in 10 countries · 2007

Implication

The political reluctance to give poor people cash is not supported by experimental evidence. Where the question is 'what should we provide?' — cash consistently outperforms most in-kind alternatives in giving recipients agency over their own priorities.

Human capital

36 experiments in registry

Gains fade when recipients return to unchanged environments.

Several high-quality experiments found that program effects diminished after the intervention ended — not because the programs failed, but because the environments participants returned to hadn't changed. Programs that permanently change circumstances produce durable effects. Programs that only temporarily subsidize behavior in an unchanged environment tend not to.

Moving to Opportunity Housing Vouchers

mixed

Short-term adult effects modest; children who moved before age 13 showed +31% higher earnings as adults, +16 pp higher college attendance; mental and physical health improvements for adults

Baltimore, Boston, Chicago, Los Angeles, New York · 1994

New York City Family Rewards

mixed

New York City, USA · 2007

Head Start Impact Study

mixed

United States (nationally representative) · 2002

Project STAR — Small Class Size

positive

+4 percentile points in small vs. regular class; effect doubled for minority and low-income students; long-term earnings gains documented in follow-up studies

Tennessee, USA · 1989

Implication

When evaluating fade-out, ask whether the program changed the participant's environment or only the participant. This distinction shapes whether additional services, follow-on programs, or environmental change are the right response.

Information

23 experiments in registry

Null results cluster in predictable contexts.

The null results in this registry cluster where: the intervention targets the wrong barrier, the behavior is deeply entrenched, the comparison group is already receiving high-quality treatment, or follow-up is too short. A null result is often a diagnosis of mismatch between mechanism and barrier — not proof that the intervention category fails.

COVID-19 Vaccination SMS Reminders

null

Largest effect: +0.2 pp (2.0% control vs. 2.2% best treatment arm) — statistically significant but practically small

Rhode Island, USA · 2021

Kansas City Preventive Patrol Experiment

null

No significant difference across conditions on any primary outcome

Kansas City, USA · 1972

Body-Worn Camera RCT — Washington DC

null

No significant effect on use of force, complaints, arrests, or assaults; null across all pre-registered outcomes

Washington, DC, USA · 2016

Fiscal Exchange Framing on Tax Bills

positive

+20% payment rate overall; +40% when bills delivered in person

Argentina · 2020

Implication

Before abandoning a tool after a null result, investigate whether it was deployed in a context where its mechanism could plausibly work. The Kansas City, DC camera, and Rhode Island vaccination results all failed for diagnosable reasons.

Social norms

5 experiments in registry

Social accountability produces the largest civic behavior effects — and the hardest design questions.

The Michigan social pressure mailing produced an 8.1 percentage point increase in voter turnout — the largest effect on civic behavior ever measured in a randomized study. Tax norm letters and energy comparison reports show consistent 2–10 point effects. The mechanism is not persuasion or information: it is social surveillance. Making behavior visible to peers activates compliance more powerfully than appeals to duty, incentives, or accurate information. The size of the effect is proportional to the intensity of accountability. And the mechanism raises a question that other interventions don't: is it appropriate to use social pressure to produce civic compliance?

Social Norm Tax Letters

positive

Treated firms paid $184 million more than controls (0.22% of GDP)

Dominican Republic · 2019

Opower Home Energy Reports

positive

−2.0% electricity consumption; effect equivalent to 11–20% temporary price increase

United States (multi-site) · 2012

Social Comparison Home Water Reports

positive

−5% water consumption; effects largest among high-usage households

United States (multi-site) · 2014

Michigan Social Pressure Voting Mailer

positive

Social pressure arm: +8.1 percentage points vs. control; self-disclosure: +4.9 pp; Hawthorne: +2.5 pp; civic duty: +1.8 pp. Social pressure arm produced the largest GOTV effect ever recorded in a randomized study at that time.

Michigan, USA · 2006

Implication

Social accountability tools are among the most effective in civic behavioral science — and among the most ethically contentious. Before deploying them, ask whether the mechanism (visibility, potential embarrassment) is consistent with the civic culture you are trying to build. Effectiveness and appropriateness are separate questions.

Human capital

36 experiments in registry

Training mechanism matters more than training duration.

The JTPA evaluation — one of the largest RCTs in US social policy history — found no effect on earnings for adult men and negative short-term effects for youth, despite substantial investment. Year Up, evaluated in a similarly rigorous lottery-based RCT, found a 30% earnings increase sustained over five years. The interventions both involved job training. They produced opposite results. The difference is mechanism: JTPA delivered generic skills training disconnected from specific employer demand. Year Up built curriculum around real job openings, placed participants in real internships, and measured employer satisfaction alongside participant outcomes. Sectoral training — connected to actual labor market demand — produces large sustained effects. Generic training typically does not.

Becoming a Man (BAM) — Cognitive Behavioral Therapy

positive

Violent crime arrests: −44% during program year; school engagement: +14%; graduation rates: +19% in 4-year follow-up

Chicago, IL, USA · 2015

Chicago STAR Scholarship — Dual Enrollment

positive

FAFSA completion: +5.4 pp; City Colleges enrollment: +3.2 pp; college persistence to year 2: +4.1 pp; effects largest for first-generation students

Chicago, IL, USA · 2016

Year Up Young Adult Job Training Program

positive

Quarterly earnings: +$1,000/quarter (+30%) at 3-year follow-up; annual earnings advantage: ~$4,500; employment rate: +11 pp; effect sustained through year 5 in extended follow-up

United States · 2018

Job Training Partnership Act (JTPA) National RCT

mixed

Adult women: +$1,500 earnings at 30 months (+12%); adult men: no statistically significant effect; out-of-school youth: negative effect during training; female youth: small positive effect

United States · 1994

Implication

Evaluate 'job training' programs by their mechanism, not their category. The relevant question is not 'is this a training program?' but 'does this training connect participants to documented employer demand for specific skills in a specific local labor market?' Without that connection, expect null results.

Commitment device

3 experiments in registry

Commitment devices overcome present bias more reliably than information or incentives.

Save More Tomorrow (SMarT) increased retirement savings rates from 3.5% to 13.6% over four years — not by increasing workers' motivation to save, but by removing the moment of decision. Workers committed in advance to direct future pay raises toward savings, so they never experienced a current-income reduction. The mechanism — pre-commitment to a future behavior — is more powerful than financial education (which produces knowledge without behavior change) and comparable-cost incentives. The same pattern appears in Kenya's commitment savings accounts: access to a locked savings product increased savings by 66% and agricultural investment by 44%, despite no change in interest rates.

Green Electricity Opt-Out Default

positive

Opt-out adoption: 68–94% vs. opt-in: 7–41% (roughly 10× difference)

Germany · 2016

5p Plastic Bag Charge

positive

−95% single-use plastic bags distributed by major supermarkets between 2015 and 2021; from 7.6 billion bags/year to <300 million

England, UK · 2015

Save More Tomorrow (SMarT) Commitment Program

positive

SMarT participants: savings rate increased from 3.5% to 13.6% over 4 years; immediate advice group: from 4.4% to 8.8%; advice refusers: 6.6% (no change). SMarT had 78% retention rate vs. 26% for immediate increase group.

United States · 2004

Commitment Savings Accounts for Agricultural Workers

positive

Commitment account holders: +82% savings relative to controls; agricultural input expenditures +37%; output +22%

Kenya · 2008

Implication

When the barrier is present bias — people want to do something in the future but consistently fail to act in the present — information and incentives will underperform. Design for commitment: identify a low-friction moment when the future behavior can be pre-authorized, and build the default around that commitment.

Cash transfer

20 experiments in registry

Unconditional cash does not reduce work — and dramatically improves wellbeing.

The most persistent objection to cash transfers — that recipients will stop working — has been tested rigorously and consistently failed to find support. Finland's basic income experiment found recipients worked slightly more than controls. The Stockton SEED found full-time employment rose faster in the treatment group. The Manitoba Mincome found only two groups reduced work: mothers who delayed returning after childbirth, and teenagers who stayed in school longer. GiveDirectly recipients in Kenya maintained comparable work hours while investing in productive assets. The pattern holds across continents, contexts, and income levels.

Finland Basic Income Experiment

mixed

Employment: treatment group worked an average 6 more days than controls in 2018 (statistically significant but modest). Wellbeing: treatment group reported significantly higher life satisfaction (+0.09 SD), lower psychological distress, greater trust in institutions, and lower perceived bureaucracy. No significant increase in income.

Finland · 2017

Stockton SEED — Guaranteed Income Pilot

positive

Full-time employment: 28% treatment vs. 25% control at baseline; 40% treatment vs. 37% control at 12 months — treatment group employment increased more. Income volatility: significantly lower in treatment group. Mental health: significantly better (anxiety and depression scores). Physical health: no significant difference. Spending: primarily on food, clothing, utilities — not alcohol or cigarettes.

Stockton, CA, USA · 2019

Manitoba Mincome — Guaranteed Annual Income

positive

Hospitalization rates fell 8.5% relative to control communities; mental health hospitalizations fell significantly; domestic violence-related hospitalizations fell; high school completion increased; labor supply: only small reductions, primarily among mothers (who delayed return to work after childbirth) and teenagers (who stayed in school longer)

Manitoba, Canada · 1974

GiveDirectly — Large-Scale Basic Income in Rural Kenya

positive

Consumption: +30% vs. baseline; assets: +40% in asset values; food security: significantly improved; psychological wellbeing: large positive effects on stress and happiness; hours worked: no reduction; local economic spillovers: significant — control villages within treated areas also showed consumption increases (~$2.50 for every $1 transferred, via multiplier effects)

Siaya County, Kenya · 2016

Implication

Policy designs that restrict cash transfers with work requirements or conditionality — SNAP, TANF, housing vouchers — are driven by an assumption that evidence does not support. Where reducing administrative burden and conditionality is politically viable, the evidence suggests it improves outcomes without sacrificing labor supply.

Human capital

36 experiments in registry

Cognitive behavioral therapy changes behavior where information campaigns fail.

The Becoming a Man experiment in Chicago demonstrated something information-based programs like D.A.R.E. have consistently failed to achieve: a large, replicated, durable reduction in violent crime and school disengagement among high-risk youth. The mechanism is not information — it is cognitive. At-risk young men know that violence has consequences; they don't lack information. What they lack is the practiced ability to interrupt automatic threat responses before they become irreversible actions. CBT trains that interruption. D.A.R.E. provides facts. The difference in outcomes is enormous: BAM produces −44% to −50% reductions in violent arrest; D.A.R.E. produces zero reduction in drug use across 10-year follow-up.

Drug Courts — Randomized Trial Evidence Base

positive

Recidivism: −8 to −14 pp vs. standard prosecution in RCTs; drug use: significant reduction while in program, mixed evidence of persistence; incarceration days: substantially fewer during supervision period; cost savings: $3,000–$13,000 per participant vs. incarceration

Multiple US cities · 1997

IPS Supported Employment for Severe Mental Illness

positive

Competitive employment at 18 months: 61% IPS vs. 23% standard vocational rehabilitation (original US RCT); international multi-site trial: 55% vs. 28%; meta-analysis across 11 RCTs: IPS approximately doubles competitive employment rates; no significant difference in psychiatric hospitalizations or symptoms

Multiple US cities and international sites · 1996

Becoming a Man (BAM) — Cognitive Behavioral Therapy for At-Risk Youth

positive

Violent crime arrests: −44% to −50% during program year; total arrests: −30% to −35%; school engagement: significantly improved; graduation rates: +14% to +19% in follow-up. Effects on violence persisted for at least one year post-program. No significant effect on test scores.

Chicago, IL, USA · 2009

D.A.R.E. — Drug Abuse Resistance Education

null

Virtually zero effect on drug use at all follow-up intervals. Ennett et al. (1994): effect size d=0.06 on drug use (near zero and below threshold of practical significance). Lynam et al. (1999): DARE and control students showed no significant differences in drug use, attitudes, or drug-related behaviors at 10-year follow-up. GAO (2003): none of 6 long-term evaluations found significant reductions in drug use.

Multiple US cities · 1983

Implication

Before designing programs for populations with high-risk behaviors, diagnose whether the barrier is informational (in which case information works) or automatic/habitual (in which case CBT-based approaches outperform information dramatically). Youth violence, recidivism, and addiction typically fall in the second category.

Information

23 experiments in registry

Popular programs with intuitive logic produce null results at remarkable frequency.

Three of the most widely adopted government programs of the past 40 years — D.A.R.E. drug prevention, police body-worn cameras, and workplace wellness programs — have been subjected to rigorous RCTs and found to produce no statistically significant effect on their stated primary outcomes. All three are intuitively appealing, politically popular, and expensive. All three have continued or expanded after null results became public knowledge. The pattern reveals a systematic problem: popularity is not a function of effectiveness, and political sustainability is not correlated with evidence. What these programs have in common is a plausible causal story (deterrence, accountability, behavior change) that turned out not to operate as assumed in real-world conditions.

Kansas City Preventive Patrol Experiment

null

No significant difference across conditions on any primary outcome

Kansas City, USA · 1972

Police Body-Worn Cameras — Washington DC Randomized Trial

null

Use of force: no statistically significant difference between camera and no-camera officers (effect size near zero). Civilian complaints: no statistically significant difference. Arrests: no significant difference. The null findings were precisely estimated, ruling out effects larger than approximately 2 per 1,000 officer hours.

Washington, DC, USA · 2016

D.A.R.E. — Drug Abuse Resistance Education

null

Multiple US cities · 1983

Workplace Wellness Programs — Illinois RCT

null

Healthcare spending: no significant difference (p>0.05 across all 38 outcomes pre-specified). Exercise: significantly higher self-reported exercise in treatment group. Weight, blood pressure, cholesterol, smoking: no significant difference. Absenteeism: no significant difference. Job performance: no significant difference. Tenure: no significant difference. The one positive finding — exercise — did not translate to any measurable health outcome.

University of Illinois, USA · 2016

Implication

Program adoption decisions should be at least partly independent of program popularity. The most important question is not 'is this widely used?' but 'has it been rigorously tested, and what did the test find?' A null result from a well-powered RCT is more informative than broad adoption from unevaluated implementation.

Design Principles

Six principles that emerge from the evidence.

Not rules derived from theory. Patterns observed across repeated experiments in varied contexts.

Diagnose the barrier before choosing the mechanism.

The most common cause of null results is a mismatch between the mechanism of the intervention and the actual barrier. Is the barrier information? Friction? Trust? Incentive? Resources? Each requires a different solution.

Measure equity effects — not just average effects.

The RAND HIE showed no average health effect of free care, but significant benefits for poor and sick patients. Policies designed from average effects routinely harm their most vulnerable targets. Pre-specify subgroup analyses.

Short-term evaluations underestimate early-stage investments.

Perry Preschool, NFP, and the Jamaica study required decades to reveal their full effects. Programs evaluated at 1–2 years systematically understate ROI for human capital interventions. Build long-term follow-up in from day one.

Replication is the standard, not the exception.

The most consequential findings here — NFP, PROGRESA, Opower, hot spots policing — achieved policy influence through replication across contexts. A single positive result is a hypothesis. Three consistent replications in varied settings are evidence.

Publish null and mixed results.

The Kansas City patrol null, the DC body camera null, and the Head Start fade-out are as important as any positive finding. Hidden null results cause communities to repeat failed approaches. Published null results prevent that waste.

Start with the reversible pilot.

PROGRESA's village randomization and Oregon Medicaid's lottery both produced rigorous evidence because researchers preserved uncertainty. Policymakers who demand certainty before acting foreclose the possibility of learning at all.

Epistemic humility

What this synthesis cannot tell you.

Generalizing from experiments is itself a form of inference. Every finding was produced in a specific context, population, and time. The patterns here are working hypotheses, not universal laws.

Will this work in my community?

External validity is the central challenge. The patterns here are more reliable when the mechanism is consistent with local context. Ask whether the barrier is the same, not just whether the population looks similar.

Which effect size should I expect?

Effect sizes vary widely even within replications. Opower's 2% energy reduction replicates across 100+ utilities but ranges from <1% to 4% by site. Use pooled estimates as planning assumptions, not guarantees.

What about interactions between interventions?

Most experiments test one mechanism in isolation. The BRAC graduation program works because it bundles assets, training, cash, health, and coaching — but the experiment cannot tell us which components drive results or how they interact.

Does scale change the result?

General equilibrium effects — what happens when a program reaches everyone — are rarely captured. A job training program that works for individuals may not raise wages if everyone receives it simultaneously.

Closing thought

The experiments that changed policy — PROGRESA, Perry Preschool, Nurse-Family Partnership, Opower — were not exceptional in their ambition. They were exceptional in their willingness to be measured honestly and to wait for the answer.

The goal of The Experiment Society is not to replicate these landmark studies. It is to make the practice of honest measurement ordinary — one library, one permit office, one parks department at a time.

Browse the registry →Start a pilot