The Day-4 Trap
What we learned about integrity, refunds, and reputation from running fixed-price AI sprints.
Our contract says: if there’s no working prototype by Day 5 of a 10-day AI sprint, the customer can stop the engagement and recover 50% of the fee. No subjective judgment, no fine print. Either the prototype runs, or it doesn’t.
Customers usually focus on Day 5 when they read that clause. They ask about it on the call, in the email follow-up, sometimes in the contract negotiation. It’s the visible gate.
We focus on Day 4.
If you wait until Day 5 to discover the prototype won’t work, the only two paths available to you are bad. Day 4 is when the choice is still real. And the choice you make on Day 4, quietly, internally, in about four hours, decides whether you end the sprint with a happy customer or a story being told at the customer’s next investor dinner.
This is the most important operational lesson we’ve learned, and it’s the one I most often see service operators get wrong.
What goes wrong on Day 4
You started the sprint with a clean scope. The Sprint Brief is signed. Days 2 and 3 were architectural. Day 4 you started running the workflow end-to-end with real data for the first time.
Then one of the following happens:
The customer’s API has an undocumented constraint that breaks the data flow.
The prompt produces 70% accuracy on real inputs when the success rubric requires 90%.
The data is messier than the sample suggested, typos, formatting noise, edge cases that weren’t in the test set.
A key field you assumed would always be present is actually missing 30% of the time.
The customer’s compliance posture flags an issue you didn’t see on Day 1.
These are not exotic problems. They’re the median problems. Every sprint has at least one. Some sprints have all of them.
By 4pm on Day 4, you usually know, before the customer does, whether the prototype is going to run the next day as scoped.
You now have about four hours, and two options.
Option A: ship it anyway
You can spend Day 4 evening duct-taping the prototype into something demo-able by Day 5. Lower the accuracy threshold internally. Hard-code around the missing fields. Hide the edge cases behind a fallback message that says “this case requires human review.” Ship something that runs end-to-end on the happy path, even if you know the happy path is 60% of the real inputs.
The customer sees a working demo on Day 5. They clear the refund gate. They send you the second 50% of the fee. You’ve bought yourself five more days to patch the foundation.
This is Option A. It’s the tempting one. It clears the cash. It avoids the awkward email. The work you did doesn’t get refunded.
It’s also the trap.
Why Option A destroys the brand
The 50% refund costs $2,500–$5,000 depending on the sprint. That’s a real number, and walking away from it stings.
The reputational cost of a customer telling their network “we paid for an AI sprint and the prototype hallucinated” costs more than the refund recovers. It costs more for years, because that story gets repeated whenever AI services come up at a founder dinner or in a portfolio Slack.
The math is straightforward:
Clean refund cost: $2,500–$5,000, one-time, recoverable.
Bad-prototype reputational cost: 1–3 future sprints lost to the negative reference, indefinitely.
You don’t have to be a finance person to know which one is more expensive. But under pressure, in real time, on Day 4 at 6pm, the cheaper option feels like Option A. Brains do not weight low-probability long-tail reputation costs against immediate cash. Brains take the cash.
This is why the discipline has to be a rule, not a judgment call.
Option B: write the email
The right Day-4 move is roughly this:
Stop the original implementation path immediately. Don’t keep working on it while you decide.
Identify the narrowest version of the workflow that could actually run on Day 5. Usually a reduced input scope, a simpler data source, or a deliberately restricted use case. Sometimes “the workflow works on these specific input types and we flag everything else for human review” is enough to make the prototype real and honest.
Email the customer that afternoon. Lay out what you’ve found, why the original scope isn’t shippable in the remaining time, and what the narrower scope looks like. Be specific about the trade-off.
Get written confirmation before continuing. Verbal agreement on Day 4 becomes “I don’t remember agreeing to that” on Day 9. Email or it didn’t happen.
Three outcomes are acceptable from that email:
The customer agrees to the narrower scope. You ship that on Day 5 and the rest of the sprint continues at the new scope.
The customer agrees to pause the sprint, fix the data or access issue on their side, and reschedule. You ship a working version when the preconditions are right.
The customer invokes the Day-5 refund clause. You return 50%. The work to date stays with them. Everybody parts cleanly.
One outcome is unacceptable: silently shipping worse than promised.
What customers actually do when you write the email
This is the part that surprised us, and it’s the reason this approach is sustainable.
Customers, the kind we work with, founder-CEOs of B2B SaaS companies, almost never invoke the refund when you write the Day-4 email properly. What they do is one of two things:
They agree to the narrower scope, because they’d rather have a real shipping workflow than a refunded contract.
They pause and reschedule, because they want to fix the data issue and try again with the foundation right.
The refund-and-walk is rare. The respect, on the other hand, compounds. Customers tell other founders about the vendor who wrote them honestly on Day 4 instead of duct-taping a demo on Day 5. That’s the reference call you want them making.
There’s a deeper thing happening here. The reason customers respond well to the Day-4 email is not because they’re nice , it’s because the email is evidence that the vendor cares about something other than clearing the gate. That signal is rare enough in service businesses that it functions as a moat.
Why this isn’t generosity
Nothing about this is altruistic. The discipline is mathematical.
Reputational cost of a bad shipped prototype: substantial, recurring, hard to recover from.
Cash cost of a clean refund: bounded, one-time, recoverable in the next sprint.
The Day-4 email is the cheaper option once you internalize the time horizon. Service businesses that survive past their first decade are the ones that figured this out. The ones that didn’t figure it out are the ones whose founders are still complaining about how unfair it is that nobody refers them anymore.
If you take one operational principle from this, take this: surface problems early, in writing. A problem surfaced on Day 4 is solvable. The same problem surfaced on Day 9 is a crisis. The same problem buried in a thin Day-5 demo is a brand-damage event that pays out for years.
Day 5 is the visible gate. Day 4 is the real one.


