There is a particular image that has stayed with Yar Deng, an international development and humanitarian expert, since her years working in South Sudan. A supervisor at the Donor Embassy once told her: "Yar, the UN is like a big ship. In order for the big ship to turn and make a huge change, it takes time." She understood what he meant.
She’d seen it first hand, working first as an implementer with Mott MacDonald BV, then as a donor-side officer at the Embassy, and later with UN FAO, watching both worlds from the inside. What struck her wasn't the metaphor itself. It was the second part: that the ship was taking much longer to turn than anyone would like.
When we spoke with her recently, she put it plainly: under the intense pressure of reporting deadlines, teams can end up "just putting in the numbers for the sake of meeting that target, rather than capturing the real change that you want to see." What that means in practice, we'd argue, is that a team can be doing everything right in the field and still arrive at reporting time with nothing meaningful to show for it.
Across humanitarian and development organisations, this tension has been building for years, leading to the demonstration of impact becoming one of the most draining, most contested, and most misunderstood parts of the work. As we dug deeper, it became clear that this isn't a story about underperforming teams or misaligned intentions, but about a system that was never quite designed for the complexity it's now being asked to navigate.
With this article, we want to name that complexity honestly and begin to reframe it. If you lead programs, report to donors, or carry the weight of proving impact in conditions that resist easy measurement, this one is for you.
"Programs are developed with positive and wishful thinking that things will go according to plan," says Yar.
Most funding is designed at the headquarters level, often months before implementation begins. By the time a program reaches the field, its assumptions about context, partner capacity, and measurable change have already been set.
That being said, these plans are not made carelessly. Risk assessments are conducted, and logframes (the planning tools that map a program's activities to its intended outcomes) are reviewed before funding is approved. DFID/FCDO guidance expects them to be revised at annual review points throughout implementation. In practice, they rarely are. Research shows that only a minority of programs make substantive mid-cycle changes to their results frameworks, even when context demands it.
The consequences of this are tangible. As Yar describes: "You'll end up missing the planting season or the rainy season just because the seeds didn't get in on time." Procurement approval processes move through multiple committees, each adding delay, until the implementation window has closed. Delays in procurement can cause teams to miss critical planting or rainy seasons simply because the seeds arrive too late. Approval processes often pass through multiple committees, with each step adding further delay until the implementation window has closed.”
What gets measured is shaped by what was designed at the start. DAC and Paris Declaration follow-up analyses show that donors have gradually expanded outcomes-based indicators, but continue to require detailed output-level accountability, particularly in humanitarian and fragile-context portfolios. The harder questions - whether knowledge was retained, whether livelihoods improved, whether communities became more resilient - sit outside that framework and the budget allocated to answer them.
Sector benchmarks place M&E at roughly 3–5% of total program budgets (ALNAP State of the Humanitarian System, 2022), with evidence suggesting humanitarian responses frequently fall below even that. Yar's experience confirms it: “If only 5% of the total project budget is allocated to monitoring and evaluation, it can be challenging for teams to deliver high-quality outcomes.”
Outcomes fail to get captured for different reasons. Understanding which kind of outcome you are dealing with, and why it resists measurement, is the first step to doing something about it.
Most project cycles run between three and five years. Hence, real impact (the kind that endures after funding ends) often takes longer than that to appear, and longer still to verify.
Yar describes a farmer she worked with in South Sudan: before the intervention, he was subsistence farming. Afterwards, he had increased his hectares, was earning income, paying his children's school fees, and was able to respond when a child fell sick. That is the change the program was designed to create. But it emerged gradually, across multiple seasons, well after the reporting window had closed.
"It goes beyond the project cycle," she says. "One project sometimes cannot really attribute the change to it because of the short time." This is the attribution problem in its starkest form. By the time the change is visible, the program that contributed to it has already ended, its final report already filed.
Even when change is visible within the project cycle, proving that a single intervention caused it is rarely straightforward. Community resilience, economic recovery, and institutional trust are all shaped by forces far outside any program's control, such as market conditions, political stability, climate, and other interventions operating in the same area.
Resilience is a case in point. "Can they cope with shocks?" Yar asks. "If a conflict comes, are they resilient enough to rebuild?" These are the right questions. But answering them requires tracking communities across time, through shocks that haven't happened yet, under conditions no logframe anticipated.
Some UN frameworks are beginning to develop resilience indicators, but as Yar notes, the methodology is still catching up with the ambition.
Behavioural change sits furthest from what a reporting framework can hold. It is real, and it shows up eventually in yields, in incomes, in the choices people make; the mechanism itself resists quantification.
Yar describes working with farmers who believed fertilisers were harmful to their soil. Over time, as they learned to apply them correctly, something shifted. "Their productivity increased, their income increased," she says. "But just that realisation that there's actually something I could do you can't put a number on that. You can use success stories, but not in the same quantifiable way."
The same applies to social cohesion, protection behaviours, and confidence. These are often the most meaningful things a program produces. They are also the ones most likely to disappear from the record entirely.
Everything we have described so far, the locked-in assumptions, the output-heavy frameworks, the outcomes that slip past the reporting window, all raise an obvious question. Why don't teams simply design programs differently from the start?
The answer, in most cases, is that the incentives don't support it. Reporting requirements don't just measure a program after it's been designed. They shape it before implementation begins. When teams know they will be held accountable for outputs, they scope projects around what's reportable. Activities get prioritised because they are visible and easy to count, not because they are the most likely path to meaningful change. This isn't cynicism, but a rational response to the system teams are operating in.
Yar describes how this plays out at the proposal stage. "Implementing partners often work under significant pressure to secure funding, which can sometimes lead them to accept ambitious targets in proposals with the intention of addressing any challenges during implementation.” Once implementation begins and the reality of that commitment becomes clear, something has to give. M&E gets deprioritised. Innovation gets quietly set aside. As Yar puts it, "You don't leave enough space for innovation. You don't leave enough space to actually do the work on the ground."
What follows is a cycle that neither side intended. The implementing partner doesn't push back on requirements it knows are unrealistic, because it doesn't want to lose the grant. The donor never hears that the requirements are unworkable, so it keeps expecting the same. Nothing corrects. And as Yar notes, the consequence isn't always that the outcome didn't happen, it's that it wasn't captured. The reporting architecture simply wasn't designed to look for it.
Most of the frustration described so far traces back to the same root cause: design. Specifically, whether outcome tracking is treated as something that happens continuously throughout delivery, or something that gets reconstructed under pressure at the end.
Yar describes what the latter looks like in practice. When reports are due, teams are "checking this Excel, checking that Excel, panicking." The data often exists somewhere, but it's fragmented across field offices, partners, and spreadsheets showing that data was not centralised. When the report is due, teams are starting from scratch and chasing data that should have been there all along.
"You don't have to panic only when the report is due," Yar says. " When systems are centralised, you can keep checking the data throughout the project cycle." That requires shared visibility across the full lifecycle of a program, not just at reporting milestones. Outcome indicators agreed at the proposal stage were tracked during delivery, not deferred until the final report. Evidence is built as the work happens.
As Yar puts it: "At the heart of it, people truly want to see meaningful change. It’s really just about finding the right way together to make that change visible.”
That question is one the team at Tactiv thinks about constantly when working with humanitarian and development organisations to understand where the system breaks down and what it would take to build it differently.
If it's a question your team is sitting with too, we'd love to hear from you.