What moves the needle in education.
Education has more randomized evidence than most policy domains — and some of the most counterintuitive findings. Large investments sometimes fade. Low-cost structural changes sometimes produce lasting effects. The evidence rewards specificity about mechanisms.
9
experiments
8
positive results
1
mixed results
2
replicated
Key Findings
01
Class size reductions produce lasting achievement gains — particularly for disadvantaged students.
Project STAR randomized 11,600 students across 79 Tennessee schools to small classes (13–17 students) or regular classes (22–26 students) beginning in kindergarten. Students in small classes outperformed controls by 0.2–0.3 standard deviations in math and reading — gains that persisted through 8th grade, years after students returned to regular classes. Effects were substantially larger for Black students and students from low-income families. Long-run follow-up found higher graduation rates, higher college attendance, and higher earnings at age 27 for students randomly assigned to small classes. The mechanism is likely that smaller classes allow more differentiated instruction and more student-teacher interaction, particularly benefiting students who arrive with fewer resources.
02
Later school start times reliably improve student outcomes across multiple domains at low cost.
When Seattle Public Schools delayed high school start times by 55 minutes (from 7:50 to 8:45 am), students gained an average of 34 minutes of sleep per night. A natural experiment comparing students before and after the change found a 4.5% GPA improvement and a 9.6% improvement in attendance. Crucially, this was not a selection effect — the same students improved when start times changed. The biological mechanism is well-established: adolescent circadian rhythms shift during puberty, making early rising physiologically difficult. Later start times align school schedules with this biology. The intervention costs close to nothing relative to its effect size, yet most districts have not adopted it.
03
Intensive post-secondary employment and training programs produce large, durable earnings gains.
Year Up randomized 2,544 young adults to an intensive 6-month training program or a control group. At the 3-year follow-up, participants earned 53% more than controls — roughly $9,000 more annually, rising from ~$17,000 to ~$26,000 — with effects concentrated in the first two years and persisting thereafter. The Career Academies evaluation tracked students for 8 years post-program and found a $30,000 earnings advantage for young men (no effect for young women). Both programs share a structure: employer networks, work experience embedded in training, and explicit occupational skill development. The effects are among the largest ever measured for any educational or workforce program.
04
High-intensity early and school-age programs produce gains — but effect persistence varies widely.
The Harlem Children's Zone Promise Academy randomized admission via lottery and found that students who attended achieved proficiency rates of 100% in math and over 90% in reading — far above city averages. Effects persisted through high school and included higher college attendance. The Wales Universal Free School Breakfast program produced +2 months of additional academic progress for Key Stage 1 students. By contrast, the Head Start Impact Study found that benefits for pre-K participants largely faded by 3rd grade for most cognitive outcomes, though effects on social-emotional development were more durable. Intensity, duration, and quality of follow-on schooling appear to moderate how much early gains persist.
Important Mixed Results
Not every null or mixed result is a failure. These programs produced real gains that subsequent conditions failed to sustain — a different lesson than “the program didn't work.”
Head Start Impact Study
United States (nationally representative) · 2002
Head Start produced measurable improvements in pre-K outcomes, but most effects faded by 3rd grade. The fadeout is likely driven by the quality gap between Head Start and the elementary schools most participants attend afterward — the program gains are real, but subsequent schooling may not sustain them.
All Experiments in the Registry
Project STAR — Small Class Size
positiveTennessee, USA · 1989
Universal Free School Breakfast Program
positiveWales, UK · 2007
Chicago STAR Scholarship — Dual Enrollment
positiveChicago, IL, USA · 2016
Year Up Young Adult Job Training Program
positiveUnited States · 2018
Job Training Partnership Act (JTPA) National RCT
mixedUnited States · 1994
Career Academies — MDRC Long-Term Evaluation
positiveMultiple US cities · 1993
Tennessee STAR — Student/Teacher Achievement Ratio
positiveTennessee, USA · 1985
Seattle Later School Start Times — Natural Experiment
positiveSeattle, WA, USA · 2016
Harlem Children's Zone — Promise Academy Charter Schools
positiveNew York City, USA · 2004
What the Evidence Cannot Yet Tell Us
Do Project STAR class size effects persist because of instruction quality, peer composition, or teacher attention? Disentangling the mechanism would clarify when class size reductions are worth their cost.
Can Year Up's 53% earnings gain be replicated in programs without strong employer networks and pre-committed job placements at program completion?
Are the Harlem Children's Zone effects driven by school quality, extended learning time, community programming, or selection of highly motivated families?
What is the right cost-effectiveness comparison between class size reduction ($40,000+ per student) and later school start times (near-zero marginal cost)?
Why do Head Start gains fade — is it subsequent school quality, the absence of continued program intensity, or regression to the mean in measured outcomes?
Does universal free school breakfast produce learning gains in high-income countries where food insecurity is less acute, or only where nutritional deprivation is prevalent?