Andrej Karpathy — one of the people who helped build the AI we all use today — published something quietly this week. It's called autoresearch, and the idea is simple enough to sketch on a napkin.
You give an AI agent a task. The agent tries something, checks if it worked, keeps the good changes, throws out the bad ones, and tries again. Repeat. All night. You wake up and it's done about 100 rounds.
The part that stuck with us isn't the AI research angle — it's the pattern. The whole thing runs off a plain text file called program.md. That file is the agent's instruction manual. You write in plain language what you want it to do and how to judge success. The agent reads it, acts, checks the score, and loops.
Most of us don't train AI models. But we do have repetitive processes where the goal is clear and the rules can be written down. Testing email subject lines. Trying different ways to write a product description. Reviewing supplier quotes against a checklist.
The mental model here — give an agent a written playbook and a way to measure success, then let it run — is one of the most reusable ideas we've seen in a while.
AI agent — An AI that doesn't just answer one question but takes a series of actions on its own to reach a goal, like a very focused assistant that never gets tired.
Validation metric — A score that tells you whether something worked. Like tracking which pizza on your menu gets reordered most — the number helps you decide what to keep.
Agentic loop — When an AI does something, checks the result, adjusts, and does it again. The loop keeps going until the job is done or you stop it.
Something worth sitting with: if you could write down the rules for a repetitive decision in your business — clearly enough that a smart new hire could follow them — an agent could probably run that loop for you.