How a Forward Deployed Engineer Runs a 10-Day AI Sprint
Day by day inside a fixed-scope, fixed-price AI engagement, including the day that quietly decides whether the sprint ends well.
By the end of this article you should understand exactly how I turn a customer’s vague “we want AI” into a shipped workflow in 10 business days. You’ll know what happens each day, where the pressure points are, and which day actually decides whether the engagement ends well. It isn’t the day most people focus on.
I run these sprints for founder-led B2B SaaS companies. The cadence is the same every time. Contract template, written Sprint Brief, Day-5 refund clause, handoff package. I don’t improvise any of it. The whole point is that the customer is buying a known shape, not a bespoke experiment.
Here’s the playbook.
The container
Ten business days. One narrow AI workflow. Fixed scope, fixed price, with code committed to the customer’s repo on Day 10.
Two preconditions before the clock starts. The contract is signed and the deposit (typically 50%) has cleared. The customer has provided codebase access (read-only at minimum), data source samples, and named a single decision-maker who will be available during sprint hours. If either is missing, the kickoff moves. The sprint clock does not start until the deposit clears.
These look like administrative details. They aren’t. They are the reason sprints either ship on Day 10 or go sideways on Day 6 because nobody on the customer side could approve a decision. The named decision-maker matters most. Without one, every scope question becomes an asynchronous Slack thread and the timeline collapses.
Day 1: Scope
Day 1 is the day the sprint is won or lost. Everything that goes wrong later goes wrong because Day 1 was rushed and the Sprint Brief was vague.
The session runs two to three hours. The structure:
Restate the use case in the customer’s own words. Confirm we both still believe it’s the right workflow.
Walk through the data sources, the existing architecture, and the integration points.
Define success criteria in writing. What does good output look like? How will the customer’s team review it? What is the measurable improvement?
Surface and document constraints: auth, third-party dependencies, security requirements, deployment limitations.
Write the Sprint Brief, a one to two page document covering scope, success criteria, deliverables, and explicit out-of-scope items.
Email the brief to the decision-maker and get written sign-off before Day 2 starts. This part is not optional.
The brief opens with a one-sentence scoping template: For [user], when [trigger] happens, the workflow will [action] and produce [output], so that [value]. If we cannot agree on the words that fill those brackets, the sprint is not ready to start.
Acceptance criteria have to be concrete. “Make it work” is not an acceptance criterion. “Reduce average prep time from 90 minutes to under 15 minutes” is. “Classify 80% or more of sample tickets correctly across a defined sample set” is. The concrete criterion is what makes Day 10 unambiguous.
Days 2 and 3: Design
UX flow, technical architecture, prompt and data structure, implementation plan. This is the day the customer is paying for judgment, not typing.
We’re choosing prompt structures, data shapes, integration patterns, and edge case handling that the customer’s team would otherwise spend weeks debating internally. Most of these decisions have a default answer that is correct 90% of the time. Knowing the defaults is most of the value.
A few that matter:
Build close to the existing stack. Don’t ask the customer to switch tools. If they use Linear, integrate with Linear. The sprint is too short to introduce new infrastructure.
Prefer boring infrastructure. The customer does not care if the architecture wins a hackathon. They care that the workflow still works in three months and that someone on their team can maintain it.
Separate prompts and config from code. Models change, APIs change, prompt patterns change. Code that hard-codes a model name or prompt structure is code the customer will struggle to update. Put prompts and config in their own files.
Log enough to improve. Capture input, output, model version, the user’s action on the output (accepted, edited, rejected), and error events. Without logging, the workflow cannot be improved in week three. With it, the customer’s team has the substrate to keep iterating after handoff.
The customer sees the design and confirms direction before any production code is written.
Days 4 and 5: Prototype
Days 4 and 5 produce the first end-to-end version of the workflow with real data. Not slides, not mock outputs. The actual workflow running, even if it is rough.
The contract says that if there is no working prototype by Day 5, the customer can stop the sprint and recover 50% of the fee. No subjective judgment, no fine print.
Customers focus on Day 5 when they read that clause. I focus on Day 4.
By the end of Day 4 you usually know, before the customer does, whether the prototype is going to run on Day 5 as scoped. Either the API has an undocumented constraint, or the prompt produces 70% accuracy when the rubric requires 90%, or the data is messier than the sample suggested.
You now have about four hours and two options.
Option A: ship a thin, hallucination-prone prototype on Day 5. Clear the gate. Take the cash. Patch the foundation over the rest of the sprint.
Option B: write to the customer that afternoon. Propose a narrower scope that can actually ship, or trigger the refund cleanly.
Option A is the trap. The 50% refund costs $2.5K to $5K. The reputational damage of “the prototype hallucinated and we paid for it” costs more than that for years, because the story gets told to other founders.
The right Day 4 move:
Stop the original implementation path immediately.
Identify the narrowest version of the workflow that can actually run on Day 5.
Email the customer that afternoon with the proposed narrower scope, the reason, and a frank statement that the original is not deliverable in the remaining time.
Get written confirmation before continuing.
Three acceptable outcomes from that email: narrower scope agreed, pause and reschedule, or clean refund. One unacceptable outcome: quietly shipping worse than promised.
This is the most important operational lesson in the model, and it is the one most operators get wrong. Surface problems early, in writing. A problem surfaced on Day 4 is solvable. The same problem surfaced on Day 9 is a crisis. The same problem buried in a thin Day 5 demo is a brand-damage event that pays out for years.
Days 6 through 9: Implementation
Heads-down engineering. Edge cases, fallback behavior, logging, error states, prompt tuning, output quality. Customer touchpoints are lower. A short async update every couple of days: what shipped, what’s next, blockers, decisions needed.
This is also when the eval loop runs. The cycle:
Collect 10 to 20 sample inputs from the customer’s real data, or representative synthetic inputs.
Generate outputs for each.
Score each output against the rubric, dimension by dimension.
Identify failure patterns. What kinds of inputs produce low scores?
Improve the prompt, the data shape, or the architecture based on what the failure patterns suggest.
Re-test. Compare scores. Document what changed.
The loop runs twice during the sprint. Once on Day 4 or 5 to validate the prototype, once on Day 8 to refine the production version. It is also part of what gets handed off, so the customer’s team can run it themselves.
Edge cases are where the sprint is won or lost, not happy paths. Cases to test: missing data, ambiguous input, very long input, contradictory input, unusual language or slang, low-quality input documents. Anything I don’t test before Day 10, the customer’s team will discover for me in week three.
Day 10: Handoff
Seven deliverables go to the customer:
Final workflow code, merged or PR-ready in the customer’s repo.
Brief implementation documentation in the repo.
A 10 to 20 minute recorded walkthrough.
The output quality rubric, with examples at each score level.
The workflow playbook: purpose, inputs, expected outputs, review steps, edge cases, maintenance guidance.
A 30-day post-sprint roadmap with specific suggested next steps.
A 60 to 90 minute team handoff and enablement session.
Known limitations get documented explicitly. Every workflow has cases it handles badly. Hiding them is how trust collapses in week three. The handoff document spells them out: “this workflow assumes inputs in English,” “this workflow has not been tested on inputs longer than 2,000 words,” “this workflow scores below 4 on tone when the input is informal slang.” Customers respect honest documentation. They lose trust fast when they discover limitations themselves that I already knew about.
Within two weeks of Day 10, I ask for permission to publish a short case study. Some say yes. Some say yes-with-anonymization. Some say no. All three are fine.
TLDR
The 10-day Sprint is a known-shape container. Contract template, Sprint Brief, deliverables package, Day-5 refund clause. I don’t improvise any of it.
Day 1 is the most important day. If the Sprint Brief is not tight and signed off by the decision-maker before Day 2 starts, the sprint is already in trouble.
The visible gate is Day 5. The real decision moment is Day 4. The Day-4 email is what separates operators who survive in this work from operators who don’t.
Days 6 through 9 are heads-down implementation plus two eval loops. Test on edge cases, not happy paths.
The handoff package (code, rubric, playbook, walkthrough, roadmap, enablement session) is what distinguishes Forward Deployed work from freelance work.
If you are running internal AI projects on rolling timelines and cannot figure out why they don’t ship, the diagnosis is almost always that there is no Day 1 equivalent. No signed-off brief, no concrete acceptance criteria, no Day-5 gate, no real Day 10. Without those, the project stretches forever. The 10-day shape exists to make all of that non-negotiable.


