Post

Tony Stark & Pepper Potts — My Personal AI Agents

This post is Part 1 of 3 in a series about the two AI agents I use every day.

Over the past few months, I’ve been quietly building two personal AI agents on Claude Code that I now talk to almost every day — one for productivity work and one for coding. I call them Pepper Potts and Tony Stark. Around that time, I came across a tweet on X that said something like “the best agents are the ones you create yourself”. That tweet clicked because it matched my experience exactly. In this post, I’ll share who these two agents are, and then share how I built them in the succeeding posts.

This is a personal-experience post, not a benchmark study. The numbers I quote later are honest estimates from my own day-to-day, supported by published research where I have it. Take them as a direction, not a measurement.

A Year Later — From Chatbots to Agents

In February last year, I wrote about The 4 Ways of AI Adoption. The short version was that organisations get the most out of AI by combining four paths: subscribing to copilots, learning prompt engineering, deploying internal chatbots on their own data, and building AI into their products. The framework hasn’t really changed; it’s just evolved.

What has shifted in the past year is what “copilot” actually means. A year ago, agentic AI was still early; most copilot adoption was pre-agentic, with studies reporting gains in the 10-20% range — modest, but enough that most organisations still found the subscription worth it. Around Q4 of 2025, momentum towards agentic AI picked up, and today the word “copilot” is almost synonymous with “agent”. The marketing has blurred the lines a bit, but it’s still worth being precise about the distinction:

  • A chatbot answers questions in a single turn — you ask, it answers, and the context resets fairly quickly.
  • An agent takes a task, gathers context using the tools it has available (file reading, web search, MCP servers, and so on), drafts a plan, executes the plan — sometimes spinning up sub-agents to handle sub-tasks — iterates over the results, and reports back when done. You become the reviewer rather than the typist.

Agents are bounded by how clearly you can describe the task and how much relevant context they have to work with. That’s a much higher ceiling than autocomplete, which is why it’s not surprising to hear people claim 2x, 3x, or even more impact from copilots today.

In my experience though, even the polished off-the-shelf agents only take me so far. They’re capable, but every conversation with them starts cold — they don’t know my projects, my conventions, or the small details about how I actually work that I’ve picked up over the years. That’s the main reason I started building Pepper and Tony — agents that I shape myself, that gradually pick up enough of my context over time so that I don’t have to re-explain things every session.

A few references that support the direction of that shift:

My personal estimate is somewhere around 3x in both day-to-day work and coding, though that’s not a measured number — it’s just what I feel from watching my own throughput. The third-party data above supports the direction, not my exact multiplier.

Tony and Pepper — A Marvel Universe Analogy

In the Marvel universe, Tony Stark is the one in the lab building things, while Pepper Potts is the one keeping everything else moving — the calendar, the company, the chaos. Without Pepper, Tony’s brilliance never ships. Without Tony, Pepper has nothing to ship.

My two agents follow the same split:

  • Tony focuses on the build — code, plans, commits.
  • Pepper handles everything around the build — notes, meetings, people, research, finance.

Meet Tony Stark — My Coding Agent

Tony Stark avatar

Most enterprise applications we work with live in more than one repo. Developer teams generally prefer each microservice to have its own repo. Even classic monoliths often have multiple repos for different layers of the stack — typically a frontend, a backend, and a few supporting ones for shared libraries and infrastructure.

This “mature” way of splitting code into multiple repos exists primarily because of how human teams work. Compartmentalization of code for easier maintainability is a human way of managing complexity. But for an agent, that compartmentalization is a barrier. An agent can only work with the context it has access to, and if the relevant code is spread across multiple repos, the agent can’t see it all at once. That means more back-and-forth, more re-explaining, and more friction in getting things done.

So I built a new pattern which I call the Repo of Repos. Tony is the outer “agent” repo — he pulls in all the related repos as workspace folders, while commits still flow back to each underlying repo’s own origin. Tony sees one workspace; the codebase still ships the way it always has. Because of the way Claude Code and GitHub Copilot in agent mode work, repo-of-repos (Tony) can also reach into each sub-repo directly and spin up sub-agents to handle sub-tasks when he needs to.

The two tools I personally use are Claude Code and GitHub Copilot in agent mode, which is why this series mentions them most often. They aren’t the only options though — Cursor, Windsurf, and Codex CLI all have similar agent-mode capabilities and can do the same kinds of things. I just don’t currently have subscriptions for those, so I can’t speak to them from experience.

Meet Pepper Potts — My Productivity Assistant

Pepper Potts avatar

I’ve been working with Pepper for about two months. She started as an experiment on top of my Second Brain in Obsidian, which I’ve been keeping since 2022 — every note I take is a markdown file in one big folder. The question I was asking at the start was: if a coding CLI agent works on a folder of code, why can’t it also work on a folder of notes? A few weeks of experimentation later, Pepper is the assistant I now check in with multiple times a day.

Pepper, running on Claude Code on top of my Obsidian vault, helps me with things like:

  1. Daily startup briefing — a TODO summary and a “what changed yesterday” note when I open a session.
  2. Meeting notes — capturing and routing Minutes-of-the-Meeting (MOTM), including live dictation while I’m in a meeting.
  3. People folder — small details about the folks I meet, so I can remember them next time.
  4. Research — cloud, finance, vendor products — written up and stored back in the vault.
  5. Skills toolbox — generating images, drafting Word documents, transcribing and summarising YouTube videos, generating draw.io diagrams, and so on.
  6. Self-extension — when she spots a recurring need, she’s allowed to add new skills to her own toolbox.

What’s Next

In Part 2, I go deeper into Tony Stark — My Coding Agent.

This post is licensed under CC BY 4.0 by the author.