Home

Long Horizon

May 2026

AI used to be one race. Now it's two. One's for smarter models. The other, it turns out, is for something we're only starting to figure out: long-horizon agents. Things that can run for days without derailing, finding their way to a goal mostly on their own.

They run on three things:

The hard part isn't intelligence. It's reliability, because errors compound across steps. 200 steps at 99% accuracy lands below 14% end-to-end.

A little bit of context.

The harness is the product

The harness is the wiring between the model and the world. It's what turns the model into an agent and keeps it on track: it decides what the model sees, what it can do.

Context Engineering as a practice.

Designing what goes into the model's context window: prompts, retrieval, memory, and tools. The goal is feeding the model what it needs to work reliably.

Where designers come in

I spend my days in Claude Code, token-maxxing, but I'm not an engineer. So what's my place in this, as an AI-native product designer?

The core tension: by definition, an agent that works autonomously for a long time is doing things you're not watching. And that opens up a pile of problems that are fundamentally design problems, not model problems.

The Supervision Layer.

How might we...

How might we surface key agent states like these, without halting over trivial things, spamming the user, or going silent?

/ I'm stuck
/ I made an assumption
/ I pursued multiple assumptions in parallel and committed work to each
/ I'm about to do something irreversible not part of your whitelist

The intervention layer.

How might we...

The reward layer.

How might we...


All of these questions are still unsolved. They're UX problems and taste calls as much as engineering ones.

Final edits done with like-you-talk, my own Claude skill based on Paul Graham's Write Like You Talk (2015) and Write Simply (2021).