Plan Before You Prompt
I recently gave a talk at work on how I build software with agents. The 30-minute talk could be summarized as "plan before you code". !!
Very few developers do this consistently, yet it takes almost no effort and immediately improves the quality of agent-driven development.
To understand what a plan is and why it's needed, let's do a quick experiment. I cloned the source code for the open source pair programming tool Parrit, opened the project and fired up Codex. I thought up the vaguest prompt possible and sent it off. Let's write some code:

The agent spun for 5 minutes and produced an 847 line diffset. This is impressive and terrifying at the same time. The latest frontier models are really, really good. So good they can turn the world's vaguest prompt into something plausible in minutes.
This is an extreme example. However, it's close to many developers' first experience with coding agents. We open up the editor, punch in our prompt, and our amazingly talented yet sycophantic helper fills in a lot of the blanks. We're left with a mountain of code trying to piece things together backwards:
- Is this a good design or a bad design? ...What is the design, even?
- Were all of my requirements handled?
- Were any of my requirements hallucinated?
- Is the code clean? Does it follow project standards?
- Is there any existing code that could have been reused?
- Are the tests clean? Were tests even written?
Note that all of these concerns are at different levels of abstraction. We're trying to piece together the overall design, while cleaning up implementation code, while thinking about code reuse, while thinking about testing. It's way too much to think about at once and the result is a jumbled up mess and frustration.
Any change to the code requires another prompt and another large diffset which restarts this cycle. We burn tokens on reimplementation and frustration grows.
When a new gizmo needs to be added to our stats module a week later, the design isn't firm in our heads, so we're left prompting yet again and digging ourselves into a deeper hole. We're no longer in control of the system, the system is in control of us and all we can do is re-prompt and hope for the best.
This entire process is slow, tedious, and error-prone. We start to think "this would have been quicker if I just did this by hand.."
This experience leads many developers to conclude: agent development sucks! However, the problem isn't the agent. It's the way we're working with it. We delegate all decisions unknowingly when really what we need is a way to work with the agent in a lightweight way. Enter planning.
Enter Planning
To introduce the idea, let's run the same experiment -- now with plan mode.
Let's fire up Codex again. I hit Shift-Tab to switch to plan mode. Codex reports that we're switching to gpt-5.5 high effort. I'll enter the same intentionally vague prompt to "add stats to the API endpoint".

Codex spends around 30 seconds researching the project and then enters the first part of the workflow -- questions.

The question workflow feels like the choose-your-own-adventure books I loved as a kid. Except, instead of jumping to page 50 to avoid the spaghetti monster, we're clearing up ambiguity and clarifying requirements.
The question workflow alone is incredible. It's always finding things I hadn't considered. More importantly, it helps keep us, the humans, involved and in the driver's seat on decisions being made.
Tip: try a skill like grill-me to maximize the amount of questions being asked and introduce some pushback from the LLM. Very helpful for complex features https://github.com/mattpocock/skills/tree/main/skills/productivity/grill-me
After answering questions, we'll be presented with a short markdown file describing what will be implemented:
# Add Pair Stats to Pairing History API Summary
We will implement a new statistics object in ...
- Add opt-in stats to GET /api/pairing ...
# New API Shape
{
...
}
# Implementation Changes
- Add DTOs for ...
- Add a service method ...
# Test Plan
- Controller tests around the new ...
- Service/unit tests around the new ...
- Run project lint/all tests
# Assumptions
- Stats are computed from ...
- No database migration is needed
This part of the plan is critical. We see an overview of the changes to be made, important high-level implementation details, how the feature will be tested, and what assumptions are made.
We may (and should!) have feedback on proposed changes. We can then have a back-and-forth with the LLM on this plan. The key here is that this discussion is lightweight and cheap: we're thinking at a high level of abstraction and changing a lightweight markdown file. Contrast this with telling the agent to change something after the code is written: we need to rewrite the code, rewrite the tests, rerun the tests and lint, etc.
Once we're done iterating on the plan, we tell the LLM that we're ready to go:

This will switch the agent to a medium effort level and begin implementation work. Implementation work goes quickly because we have our plan in place. Afterward, we're presented with a summary of the changes. We're then ready to review the code.
Review goes smoothly. When we read the code there are no big surprises. Fewer requirements are hallucinated.
The plan agent and implementation agent are not trying to handle too much at once. We find that the code is better and existing project abstractions are reused more often.
We are not trying to handle too much at once. We're less burnt out. We produce better code and we're able to ship more quickly.
Best of all: when the request comes in a week later to change our new feature in some way, we already have an idea for how to do so. The existing code structure sticks in our mind better. We've shifted from trying to wrangle agent generated code to collaborating with the agent.
What I'm advocating for is a very low effort lightweight plan phase. This is somewhere between 1 to 10 minutes at the beginning of a new feature, depending on complexity. The time spent pays for itself.
It's worth calling out that there are heavyweight approaches like spec driven development that tackle large chunks of development work using a series of spec files.
My preference is to tackle small chunks of development work using a single lightweight spec file. One feature is split up into multiple plan-implement cycles. It's the classic code review problem where a 100 line diffset is critically reviewed while a 1000 line diffset causes the reviewers eyes to glaze over.
A common criticism of using AI to generate code is the claim that it can't build "production grade systems". I think it's clear now that AI can build production-grade systems. But there's a new important component of "production grade" that we've always taken for granted: a human that knows what the heck is going on and can drive the ship.
Whether we realize it or not, agents are making design and requirement decisions all the time, and we need a way to surface those decisions rather than backing into them after the fact in code.
Planning is the single highest leverage tool in my toolbox to combat this problem. I recommend you give it a try.
I'm curious what you think! Please comment on your own strategies for managing quality in AI generated code.