The 5 Levels of Agentic Engineering
I caught myself last week telling a founder he was at Level 2 of agentic engineering. He looked at me like I’d made up the levels on the spot. I had, kind of. But the more I sat with it, the more the levels felt real, because the gap between Level 2 and Level 4 is the difference between “using AI” and running a different kind of business.
A few weeks ago I wrote The AI Orchestrator about how the senior engineer role is becoming something new. Matt Watson framed it as four generations. Karpathy talks about the move from vibe coding to agentic engineering. Both are right. Neither captures what it feels like to actually live at the top of the curve.
So here’s my version. Five levels, with a Level 0 for anyone who hasn’t started yet. Where are you?
The 5 levels at a glance
| Level | What it looks like | Primary tool | Bottleneck |
|---|---|---|---|
| 0. Observer (baseline) | Reading, not shipping | ChatGPT tab | Not started |
| 1. Prompter | Paste-and-pray snippets | ChatGPT, Claude.ai | Typing speed |
| 2. Copilot User | AI completes lines | Cursor, Copilot, Windsurf | Editor-bound thinking |
| 3. Director | Describe problems, not solutions | Claude Code + CLAUDE.md | Spec clarity |
| 4. Orchestrator | Multiple parallel agents | Claude Code + MCP + subagents | Judgment at scale |
| 5. Architect (emerging) | Agent-to-agent chains | Hooks, skills, RemoteTrigger | Verification infra |
That’s the map. Agentic engineering, in one line, is directing AI agents to write, review, and ship code instead of writing it yourself. The job moves from implementation to specification, dispatch, and judgment. The levels are how far into that move you actually are.
Level 0: Observer
Level 0 is when you’ve read about AI but haven’t shipped anything with it. You have a ChatGPT tab open. Maybe you wrote a prompt to summarize a podcast last month. The model is a curiosity, not a coworker.
The honest version: you’re waiting. Waiting for the dust to settle, for the right course, for someone to tell you it’s safe to start. It’s never going to be safe to start. The people three levels up started in that tab too.
Level 1: Prompter
Level 1 is when you ask ChatGPT or Claude for snippets, paste them into your project, fix the bits that broke, and ship. The model is a faster Stack Overflow. You’re still doing 95% of the work, you just type less.
This is where most of the founders I talk to actually live, including the ones who think they’re further along. The tell is simple. You can’t show me a chat history of the ten prompts that shipped something this week, because nothing shipped through chat. You shipped, the chat helped.
There’s nothing wrong with this level. It’s just not what people mean when they say AI is changing everything.
Level 2: Copilot User
Level 2 is when AI lives inside your editor and autocompletes line by line. Cursor, GitHub Copilot, Windsurf, JetBrains AI. The model has context on your repo and suggests what comes next.
Some people call this vibe coding. Prompting your way through a feature one chat at a time, AI in the editor, you in the chair. It’s not a wrong name. It’s just a description of where the work still lives, which is mostly in your hands.
I lived at Level 2 for about six months in 2024. Cursor on a Rails codebase. It felt like a superpower until I tried Claude Code and realized I’d been doing the work of a typing assistant for the AI. The structure of the work hadn’t changed. I was still the one moving cursor to file, file to function, function to fix. The AI sped up the keystrokes.
The trap at Level 2 is mistaking velocity for leverage. You can build a real-looking landing page in an afternoon. You can also stay here for years, because the next level requires letting go of the editor as the center of gravity. Most of the engineers I watch get stuck here aren’t lacking tools. They’re measuring themselves against lines of code instead of clarity of specs, and the editor still feels like home.
Level 3: Director
Level 3 is when you stop writing code and start describing problems. Claude Code in a terminal. A clean repo with a CLAUDE.md file telling the agent how this codebase works, what the conventions are, where the gotchas live. You describe what you want, the agent goes off and writes it, you review the diff.
The shift here is mental, not technical. You stopped asking “how do I build this?” and started asking “what am I trying to accomplish?” That sounds small. It’s not. It’s the moment the implementation stops being your job and judgment becomes your job.
A few weeks ago I described a casting kanban board for a client project. Drag and drop, ghost elements, accessibility hooks, anchor character mechanics, a serializer for state changes. I didn’t write the implementation. The agent did. I caught two judgment failures in review and pushed the third version. Three hours, end to end. The old version of me would have spent two days on the drag-and-drop alone.
This is where the role changes. Watson calls it Gen 3. Karpathy calls it agentic engineering. I called it the orchestrator. Same thing.
Level 4: Orchestrator
Level 4 is when you run multiple agents in parallel and your job is dispatch, review, course-correct. Right now I have one agent on a client feature branch, one on a migration, one drafting copy for the studio site. I’m the spec writer, the reviewer, and the bottleneck.
The tooling at this level matters. Claude Code as the runtime. MCP servers connecting it to Linear, GitHub, Postgres, my Obsidian vault, lst.so. Subagents for code review and security scans. Hooks that fire on file changes. Skills that load context the moment a keyword shows up in conversation. The whole thing reads less like an editor and more like a small operations room.
The bottleneck isn’t capability anymore. It’s the quality of the specs going in and the sharpness of the judgment coming out. Product taste becomes the moat. The middle of the work, the implementation, is just a thing that happens while you’re doing something else.
This is also the level that has very little to do with seniority in the old sense. You don’t need to be a senior engineer to work here, you need senior judgment. Catching the wrong abstraction in a review, knowing when an answer is too clean, spotting the bug an agent confidently shipped. Memorizing syntax stopped mattering. The taste did not.
Concretely, this is how I shipped four products solo last year, tini.bio, lst.so, gratu, and nod.so, while running two full-time consulting engagements. No team. The Level 4 economics are not subtle.
This is the level I’m at most days. It still feels new. It still breaks in surprising ways. It does not feel temporary.
Level 5: Architect (emerging)
Level 5 is when agents hand off to other agents and most of the build runs without you in the loop. One agent writes, another reviews, a third runs tests, a fourth deploys. You set constraints at the front and validate at the end.
Almost nobody is fully here yet. I get glimpses. A skill that fires before commit and runs an audit subagent. A scheduled cron that posts to Slack when a deploy goes red and a fix agent that takes a swing before I see it. RemoteTrigger jobs that wake up a Claude session on a webhook. The primitives are landing. The agents still can’t fully verify each other. The infrastructure isn’t ready.
When this level lands properly, the question stops being “how fast can I build?” and starts being “how many things can I have running at once?” That’s a different business entirely.
If you’re a founder and you don’t know which level your team is at, that’s the actual problem. The gap between Level 2 and Level 4 is the gap between a team that ships and one that’s still hiring.
So which level are you at? Be honest with yourself.
If you’re at 0 or 1, the move is to start. Pick a small project. Use Claude Code in a terminal, not just chat in a browser. Watch what happens when you let the agent take a swing and then a second swing.
If you’re at 2, the trap is comfort. Cursor is great. It’s also a ceiling if you stay there. Spend one weekend with a terminal-based agent and a CLAUDE.md file. Feel the difference.
If you’re at 3 or 4, the work is sharpening the spec and sharpening the review. The agents are only as good as the prompts going in and the judgment coming out.
The shift isn’t a job title. It isn’t a course. It’s the conceptual move from doing the work to running the work. The tools changed. The mindset has to catch up.
// Newsletter
Get my ideas every Thursday
New posts, insights, and lessons on building products with AI. One email per week.