🦊 Karpathy's Autoresearch

AI agents that autonomously improve LLM training — self-improving AI

What It Is How It Works Setup With OpenClaw YouTube

NEW — March 2026 Self-Improving AI

What is Autoresearch?

🎯 The Core Idea: Give an AI agent a real LLM training setup, let it experiment autonomously overnight. It modifies code → trains for 5 min → checks if results improved → keeps or discards → repeats. Wake up to a better model.

Karpathy's autoresearch is an autonomous AI research framework. You point an AI agent (Claude, Codex, etc.) at the repo, and it:

📝 Edits train.py — hyperparameters, model architecture, optimizer settings
🚀 Runs 5-minute training experiments
📊 Checks validation loss (bits per byte — lower is better)
🔄 Keeps improvements, discards failures
♻️ Repeats overnight — ~100 experiments while you sleep

This is literally AI self-improvement — the agent researches better ways to train itself.

How It Works

The Three Files

File	What It Is	Who Modifies
`prepare.py`	Data prep, tokenizer training (fixed)	Nobody
`train.py`	Model, optimizer, training loop	AI Agent
`program.md`	Agent instructions (the "skill")	You

The Loop

1. Agent reads program.md for instructions
2. Agent edits train.py (hyperparams, architecture, etc.)
3. Runs training for exactly 5 minutes
4. Checks val_bpb (validation bits per byte)
5. If improved → keep changes
   If worse  → revert
6. Repeat ~12 times/hour
7. Wake up to optimized model

Metric: val_bpb (validation bits per byte) — lower is better, vocab-size-independent so architectural changes are fairly compared.

Requirements

Can You Run It?

Requirements

✅ Single NVIDIA GPU (tested on H100)
✅ Python 3.10+
✅ uv package manager

Setup Commands

# 1. Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone the repo
git clone https://github.com/karpathy/autoresearch
cd autoresearch

# 3. Install dependencies
uv sync

# 4. Download data & train tokenizer (one-time, ~2 min)
uv run prepare.py

# 5. Test a single training run (~5 min)
uv run train.py

Running Autonomously

# Point your AI agent at the repo, disable all permissions, then prompt:
"Hi have a look at program.md and let's kick off a new experiment!
Let's do the setup first."

⚠️ Mac Users: No NVIDIA = no native support. But there are forks:

autoresearch-macos (MacOS)
autoresearch-mlx (Apple Silicon MLX)
autoresearch-win-rtx (Windows)

Integration

How Would This Work With Your OpenClaw?

Option 1: Separate Compute

Autoresearch needs its own NVIDIA GPU machine. You'd run it on a GPU cloud instance (Lambda Labs, RunPod, etc.) and import the resulting model to Ollama.

Option 2: Future Integration

Not currently integrated — autoresearch is standalone training code. Could theoretically be triggered via OpenClaw, but they're separate systems.

Option 3: The "Research Org"

Customize program.md to define your own research goals. The agent follows your blueprint for what "better" means.

The Path Forward

# If you had an NVIDIA GPU:
1. Run autoresearch overnight → get optimized nanochat model
2. Convert to GGUF format (llama.cpp tools)
3. Import into Ollama: ollama import model.gguf
4. Use in OpenClaw: model: ollama/your-trained-model

Alternatives

What If You Don't Have an NVIDIA GPU?

1. Use Cloud GPU

Lambda Labs, RunPod, Paperspace — rent GPU time for ~$0.50-2/hr

2. Mac Forks

Try autoresearch-mlx for Apple Silicon

3. Manual Tuning

Apply the principles manually — try different hyperparameters in nanoGPT

4. Just Use Ollama

Run pre-trained models, customize with Modelfile — no GPU needed for inference

YouTube Resources

Videos on Autoresearch

🔴 Karpathy's Tweet Announcement
The original announcement — context on the project 🔴 Dummy's Guide to Autoresearch
Great breakdown for understanding the full context Building GPT from Scratch (Karpathy)
The 4-hour classic — understand the foundation nanochat GitHub
The parent project that autoresearch is based on

Search Terms

karpathy autoresearch tutorial
AI agents self-improving LLM training
nanochat fine-tuning
autonomous AI research

🎯 Bottom Line for You

If you had an NVIDIA GPU:

Run autoresearch on a cloud GPU ($1-2/hr)
Wake up to an optimized nanochat model
Import into Ollama
Use in OpenClaw

Without NVIDIA GPU:

Use Mac forks (slower, less capable)
Or stick with Ollama + Modelfile for now
Or rent cloud GPU when you want to experiment