Runner

Bootstrapping a coding agent

It started with a Python script, Google AI Studio, Claude Code, and a lot of copying and pasting.

This was way back in March 2025.

I would run a script I wrote to convert my whole codebase into a single string. The repo I was working on at the time was about 300k tokens. I would paste that into AI Studio, select Gemini 2.5 Pro, and then start asking it how to fix things or how to build a new feature. Once it had a solution, I would instruct it to give me a detailed spec I could pass off to another developer. I would then copy and paste that spec into Claude Code for implementation.

This became my go-to workflow for anything challenging. Gemini 2.5 Pro with full codebase context could spot bugs Cursor or Claude Code on their own could not.

As useful as this was, the constant copying and pasting made this an annoying workflow to use on a daily basis. Naturally, I decided to build a tool to automate it. I decided on a CLI, because that was the easiest to build. It took less than a thousand lines of code and it worked like a charm. I could have a back and forth conversation with Gemini 2.5 Pro, with an always up-to-date copy of my codebase in context, and it could directly create and edit markdown files in my repo. When I was ready to implement a feature I would just pop over to Claude Code and tell it to look at the spec and then implement it. No more copying and pasting.

I used this CLI for about a month before the urge to do more with it was too much to resist.

What’s wrong with a CLI?

CLIs are all the rage these days, almost entirely due to the success of Claude Code. I personally love Claude Code. So why didn’t I just stick with the CLI and add features to it?

Well, here is an excerpt from some notes I took around that time:

“I have a vision for a new sort of UI for running coding agents. It’s a dashboard that’s optimized for software design and code review, rather than actually writing code. As coding agents become more and more capable — and autonomous — the challenge becomes monitoring them and guiding them. Having a human approve every individual diff synchronously is unrealistic. But the human can’t be completely out-of-the-loop either. The human still needs to interact with the system and provide input, just at one level higher than the raw code — at the architecture, design, and planning level (prior to coding) and then again at the review level.”

So I guess the short answer is “I felt like it.” Sometimes you just have to build things because you feel like building them.

The longer answer is that the CLI form factor is very limiting. If your interaction with the agent can be done purely through chat, then a CLI is great. But using chat as your only point of interaction is very limiting for a coding agent.

Here are just a few of the many features I wanted to build (and have now built) that couldn’t be done with a CLI:

  • Manual editing of specs
  • Clickable code citations
  • Consolidated diff viewer
  • Repo context configuration

So it was clear the next step was to build a GUI.

How I used Runner to build Runner

Over the course of the ~500 hours I’ve put into developing this, Runner has completed 369 tasks for me, generating roughly 75% of the 30k lines of code in the codebase. Another 20% came from Claude Code, and I wrote maybe 5% by hand in Cursor (mostly the core agent loop, where I had strong opinions about the implementation). Total LLM cost: $900. Not bad.

The codebase growth pattern was interesting. It exploded to roughly its current size in the first two weeks as I built out all the core pages and features—React components, FastAPI endpoints, WebSocket handlers, the works. Then it stopped growing. Two months later, after adding tons of new features, the codebase is essentially the same size.

Why? Because I let Runner run wild during those first two weeks. I made a deliberate choice to let the agent write as much code as possible, as fast as possible. Get to a working prototype first, worry about elegance later. As a result, it overbuilt everything. Multiple implementations of the same feature. Unused API endpoints “just in case.” Components that were created but never imported. Dead event handlers from refactored flows.

Coding agents rarely take the initiative to remove code. They just add, add, add.

So I instituted weekly codebase reviews and refactorings. I'd spend a few hours manually auditing the entire codebase for dead code, unused imports, redundant implementations, and just plain bad code. Then I would spend the next few hours working through all my notes with the agent. These refactoring sessions became oddly satisfying—watching the agent ruthlessly delete its own previous work. Each week, new features were offset by removed cruft.

Some process decisions that helped:

  • Design key components in Figma first. The agent is great at implementing designs, less great at making aesthetic choices. A few hours in Figma saved days of UI iteration. Maybe someday LLMs will develop aesthetic taste, but they sure don’t have it yet.
  • Create a simple design system. Relatedly, it helped a ton to create a design system that defined a unified color palette and fonts. Without this, you end up with styling that doesn’t look cohesive, as well as a ton of redundant code.
  • Write the really core code yourself. There are parts of your system you need to deeply understand and control. For me, that was the agent loop and parts of the context management system.
  • Only use throwaway tests initially. Controversial, I know. Tests are useful when coding with AI because they let the agent verify and iterate, so I would often still have the agent write the tests and verify they pass before committing code. But then I’d delete them. When the agent is rewriting entire modules daily, maintaining existing tests becomes a major bottleneck. Better to wait until you have a more stable codebase.

What surprised me

When using Runner to work on the Runner codebase, the agent is clearly aware that it’s working on its own code. Multiple times, when the agent hit a tool call bug while implementing a feature, it would identify the bug in its own code and create a task to fix it, without me asking. Just “Hey, I found a bug in my file editing tool, here's a task to fix it. Please implement the fix and then I’ll get back to what I was working on.”

Once, the planning agent got frustrated that it couldn't directly edit files to fix a simple typo it spotted. It knew from the code that the edit_file tool existed, but it was commented out in the config for the planning agent. So it created a task to give itself access to it. I didn't implement that one, but I appreciated the initiative.

One feature that's brought me an unexpectedly large amount of joy is automatic commit messages. I've always hated writing them, and now I don't have to. Runner uses the task title as the commit message by default. It's a small thing, but when you're making dozens of commits a day, it adds up.

Will using Runner (or any coding agent) make you 10x more productive? Unlikely. My best guess is that it would have taken around twice as long for me to build this without AI. It’s a big enough productivity improvement that I can’t imagine building something without it, but I don’t see it fundamentally changing the industry anytime soon. Software development jobs should be safe for the foreseeable future (as long as you’re willing to embrace these new tools).

One thing that did surprise me a little was that using AI to code actually made it more enjoyable. Not having to write the code myself meant I could spend more time thinking about what I wanted to build and (at a high-level) how I wanted to build it. I didn’t have to waste time on the tedious low-level implementation details that I didn’t care about.