Spare a thought for the pilots. Not the ones in the cockpit. We’re talking about all those tests and experiments we run in our programmes, that have been coming under major flak of their own recently.
Accusations of ‘pilotitis’ in development programming have been around for a while, but over the past couple of years we’ve heard it everywhere — in steering committees, donor reviews, on conference panels, in corridor conversations. We do too many pilots. They never scale. They’re not pragmatic. They don’t deliver results. The critiques and acronyms are piling up: poor RoI, insufficient VFM, no AAER pathway.
What’s striking is where this backlash is heading. The ODA crisis has driven calls to radically simplify development programming: too many ‘Christmas tree’ projects weighed down with everyone’s pet cause — just pick a few cost-effective interventions and do them at scale. The Randomised Control Trial movement has been driving in the same direction — spend an ungodly amount of money evaluating a large-scale, multi-year intervention. Effective altruism wants to optimise by concentrating resources on proven, scalable bets. Unconditional cash transfers have become the go-to approach: simple, scalable, evidence-backed, and blissfully free of messy facilitation work. More and more programmes are interested in funding ‘One Big Thing’, rather than investing resources across a range of small and diverse pilots.
Why pilots matter
The One Big Thing rests on a seductive assumption: that we already know what works. This is a radical oversimplification. Even the interventions we’re most confident about come wrapped in wafer-thin assumptions about context, delivery, market effects and sustainability. We confuse what should work with what could work.
The reality is that we work in opaque, unstable and complex environments. Binding constraints aren’t always the ones we assumed, and context shapes everything. In those environments, we need to test hypotheses, learn from them, adapt and scale. This is not a radical proposition. Firms in the real economy test, iterate, launch, pull, and relaunch all the time. They just don’t call it piloting.
This isn’t rocket science. You learn by testing (erm, just like rocket scientists…). A well-designed pilot is the cheapest insurance against the most expensive kind of failure: doing the wrong thing at scale. The RCT advocates will counter that rigorous evidence at scale beats a scattergun of small tests. They have a point. But their argument assumes you already know what to test. In the environments we work in, that assumption rarely holds.
The problem isn’t pilots — it’s bad pilots
So if pilots matter, why do so many of them go wrong? The pilotitis critics aren’t imagining things — there are too many pilots that churn through resources without producing anything useful. But this is a symptom, not the root cause. The underlying problem is how we design, run and — critically — learn from pilots in the first place.
‘Intelligent failure’ offers a useful lens here. Intelligent failures are the unexpected results of thoughtful experiments in new territory: you test something, it doesn’t produce the result you hoped for, but it generates knowledge you couldn’t have gained any other way. The four criteria for intelligent failure read like a blueprint for a good pilot: it takes place in new territory; it’s in pursuit of a clear goal; it’s driven by a hypothesis; and it’s as small as it needs to be to be informative. If your pilot meets those four tests, it’s a worthwhile use of resources regardless of whether the intervention ‘works’. The trouble is, most of the pilots that attract criticism wouldn’t pass this test. They fail — but not intelligently.
Why?
Pilots without strategy
A pilot disconnected from a programme strategy is just an activity. It drifts — consuming time and money without anchoring to anything that might give its findings meaning or momentum. A good pilot, by contrast, is tethered to a clear strategic direction. It operates at the tactical level — testing specific approaches and mechanisms — but it’s connected to a strategic logic that tells you what to do with the results. If the pilot succeeds, you have a potential path to scale. If it fails, it challenges and informs your strategy — and you adjust.
This also means thinking about what comes after the pilot before you launch it. What happens next if this works? Who else needs to be involved, and what institutional, policy or financing conditions need to be in place for results to travel beyond the pilot site? A pilot without a scaling logic isn’t testing a path to change — it’s testing an idea in isolation. And a programme with a giant pick’n’mix of disconnected pilots is an immediate red flag — not a sign of healthy experimentation, but of a team that has confused being adaptive with being directionless. There is a related problem: who owns the pilot. A pilot run under project conditions, with project resources is only testing you whether something can work when you are paying for it. The pilot only means something if the market actor owns it and has skin in the game.
Pilots that take too long
A lot of programme teams have a habit of paralysis by analysis. We lean into the systemic analysis, refine the theory of change, consult stakeholders, map the market functions, workshop the intervention logic — and eighteen months later we’re still not in the market. There’s a tension here, of course: we’ve just argued for strategic clarity, and that takes thought. But the analysis doesn’t need to be exhaustive — it needs to be good enough to formulate a hypothesis worth testing. This doesn’t require a 35-page intervention plan and 10+ rounds of ‘feedback’ from a project Steering Committee.
And once a pilot is running, we need the discipline to know when it’s done. Too many pilots drift on long past the point where they’ve answered the question they were supposed to ask. The hypothesis has been tested, the results are in — but nobody wants to make the call. Killing a pilot feels like admitting failure, so instead we extend, tweak, redesign, and quietly hope something changes (Spoiler alert: it rarely does).
Pilots we never learn from
The biggest problem isn’t when pilots fail. It’s that when they fail, we fail to learn. Failed pilots get swept under the rug and buried in the ‘lessons learned’ annex of annual reports that nobody reads. The sector’s deep-seated aversion to admitting failure means that the most valuable thing a pilot can produce — honest learning about what doesn’t work — is precisely the thing most likely to be suppressed. We all know the ritual: the pilot underperforms, the language softens, the findings are ‘nuanced’, the report concludes with a set of recommendations that bear no resemblance to what actually went wrong. The learning never lands because it was never allowed to be honest. And so the next programme repeats the same mistakes, launches the same pilots, and produces the same carefully worded disappointments. The answer isn’t fewer pilots. It’s more honesty about what they tell us.
What good looks like
A pilot worth defending is anchored in a programme strategy, not floating free of one. It has an explicit hypothesis — not just ‘let’s see what happens’ but a testable proposition about how change will occur. It’s designed to generate useful learning whether it succeeds or fails. It’s launched quickly — weeks or months, not years — because speed is a feature, not a compromise. And it’s as small as it needs to be to be informative, and no bigger.
Most critically, it has clear decision points: scale, adapt, or drop. Not every pilot should scale. Not every pilot should survive. The willingness to kill a pilot that isn’t working — quickly and without ceremony — is as important as the willingness to launch one.
The One Big Thing is alluring precisely because it promises to bypass all of this — the messiness of testing, the discomfort of failure, the patience required to learn and iterate. But complex systems don’t respond to simple solutions. You can fund One Big Thing and hope it works. Just don’t expect anything intelligent to come out of it – even failure.