Tony Stark & Pepper Potts — My Personal AI Agents

Posted May 1, 2026

By Rafferty Uy 6 min read

This post is Part 1 of 3 in a series about the two AI agents I use every day.
Part 1 of 3: Introduction (this post)
Part 2 of 3: Repo-of-Repos: Tony’s Multi-Repo Workspace for AI Coding Agents
Part 3 of 3: Personal AI Assistant: Building Pepper on My Second Brain

Over the past few months, I’ve been quietly building two personal AI agents on Claude Code that I now talk to almost every day — one for everything else and one for coding. I call them Pepper Potts and Tony Stark. Around that time, I came across a tweet on X that said something like “the best agents are the ones you create yourself”. That tweet clicked because it matched my experience exactly. In this post, I’ll share who these two agents are, and then share how I built them in the succeeding posts.

This is a personal-experience post, not a benchmark study. The numbers I quote later are honest estimates from my own day-to-day, supported by published research where I have it. Take them as a direction, not a measurement.

A Year Later — From Chatbots to Agents

In February last year, I wrote about The 4 Ways of AI Adoption. The short version was that organisations get the most out of AI by combining four paths: subscribing to copilots, learning prompt engineering, deploying internal chatbots on their own data, and building AI into their products. The framework hasn’t really changed; it’s just evolved.

What has shifted in the past year is what “copilot” actually means. A year ago, agentic AI was still early; most copilot adoption was pre-agentic, with studies reporting gains in the 10-20% range — modest, but enough that most organisations still found the subscription worth it. Around Q4 of 2025, momentum towards agentic AI picked up, and today the word “copilot” is almost synonymous with “agent”. The marketing has blurred the lines a bit, but it’s still worth being precise about the distinction:

A chatbot answers questions in a single turn — you ask, it answers, and the context resets fairly quickly.
An agent takes a task, gathers context using the tools it has available (file reading, web search, MCP servers, and so on), drafts a plan, executes the plan — sometimes spinning up sub-agents to handle sub-tasks — iterates over the results, and reports back when done. You become the reviewer rather than the typist.

Agents are bounded by how clearly you can describe the task and how much relevant context they have to work with. That’s a much higher ceiling than autocomplete, which is why it’s not surprising to hear people claim 2x, 3x, or even more impact from copilots today.

In my experience though, even the polished off-the-shelf agents only take me so far. They’re capable, but every conversation with them starts cold — they don’t know my projects, my conventions, or the small details about how I actually work that I’ve picked up over the years. That’s the main reason I started building Pepper and Tony — agents that I shape myself, that gradually pick up enough of my context over time so that I don’t have to re-explain things every session.

A few references that support the direction of that shift:

Anthropic’s 2026 Agentic Coding Trends Report describes engineers reporting smaller per-task time but much larger total output as they shift to agentic workflows.
OneReach’s 2026 Agentic AI Stats report multipliers of 2.2x to 2.7x across roles like sales, IT helpdesk, and finance.
DigitalApplied’s 2026 productivity data reports a median of around 6.4 hours saved per week, with senior practitioners closer to 10-12 hours.

My personal estimate is somewhere around 3x in both day-to-day work and coding, though that’s not a measured number — it’s just what I feel from watching my own throughput. The third-party data above supports the direction, not my exact multiplier.

Tony and Pepper — A Marvel Universe Analogy

In the Marvel universe, Tony Stark is the one in the workshop building things, while Pepper Potts is the one keeping everything else moving — the calendar, the company, the chaos. Without Pepper, Tony’s brilliance never ships. Without Tony, Pepper has nothing to ship.

My two agents follow the same split:

Tony focuses on the build — code, plans, commits.
Pepper handles everything around the build — notes, meetings, people, research, finance.

Meet Tony Stark — My Coding Agent

Most enterprise applications we work with live in more than one repo. Developer teams generally prefer each microservice to have its own repo. Even classic monoliths often have multiple repos for different layers of the stack — typically a frontend, a backend, and a few supporting ones for shared libraries and infrastructure.

This “mature” way of splitting code into multiple repos exists primarily because of how human teams work. Compartmentalization of code for easier maintainability is a human way of managing complexity. But for an agent, that compartmentalization is a barrier. An agent can only work with the context it has access to, and if the relevant code is spread across multiple repos, the agent can’t see it all at once. That means more back-and-forth, more re-explaining, and more friction in getting things done.

So I built a new pattern which I call the Repo of Repos. Tony is the outer “agent” repo — he pulls in all the related repos as workspace folders, while commits still flow back to each underlying repo’s own origin. Tony sees one workspace; the codebase still ships the way it always has. Because of the way Claude Code and GitHub Copilot in agent mode work, repo-of-repos (Tony) can also reach into each sub-repo directly and spin up sub-agents to handle sub-tasks when he needs to.

The two tools I personally use are Claude Code and GitHub Copilot in agent mode, which is why this series mentions them most often. They aren’t the only options though — Cursor, Windsurf, and Codex CLI all have similar agent-mode capabilities and can do the same kinds of things. I just don’t currently have subscriptions for those, so I can’t speak to them from experience.

Meet Pepper Potts — My Personal Assistant

I’ve been working with Pepper for about two months. She started as an experiment on top of my Second Brain in Obsidian, which I’ve been keeping since 2022 — every note I take is a markdown file in one big folder. The question I was asking at the start was: if a coding CLI agent works on a folder of code, why can’t it also work on a folder of notes? A few weeks of experimentation later, Pepper is the assistant I now check in with multiple times a day.

Pepper, running on Claude Code on top of my Obsidian vault, helps me with things like:

Financial market updates — a daily briefing on the markets I follow, with a portfolio review factoring in my financial strategy, investment style, and risk appetite.
Research — on tech, finance, even Biblical principles, with sub-agents fanning out across many search strings in parallel and digesting everything into a single note.
Work notes — helping me organize my thoughts when I’m working through something complex, then cleaning them up into something I can review later.
Memory notes — Minutes-of-the-Meeting (MOTM) notes, People notes, and anything else I’d like to remember.

What’s Next

In Part 2, I go deeper into Tony Stark — My Coding Agent.

RazType

This post is licensed under CC BY 4.0 by the author.