← Back to workspace

🦊 Karpathy's Autoresearch

AI agents that autonomously improve LLM training — self-improving AI

NEW — March 2026 Self-Improving AI

What is Autoresearch?

🎯 The Core Idea: Give an AI agent a real LLM training setup, let it experiment autonomously overnight. It modifies code → trains for 5 min → checks if results improved → keeps or discards → repeats. Wake up to a better model.

Karpathy's autoresearch is an autonomous AI research framework. You point an AI agent (Claude, Codex, etc.) at the repo, and it:

This is literally AI self-improvement — the agent researches better ways to train itself.

How It Works

The Three Files

FileWhat It IsWho Modifies
prepare.pyData prep, tokenizer training (fixed)Nobody
train.pyModel, optimizer, training loopAI Agent
program.mdAgent instructions (the "skill")You

The Loop

1. Agent reads program.md for instructions
2. Agent edits train.py (hyperparams, architecture, etc.)
3. Runs training for exactly 5 minutes
4. Checks val_bpb (validation bits per byte)
5. If improved → keep changes
   If worse  → revert
6. Repeat ~12 times/hour
7. Wake up to optimized model
Metric: val_bpb (validation bits per byte) — lower is better, vocab-size-independent so architectural changes are fairly compared.
Requirements

Can You Run It?

Requirements

Setup Commands

# 1. Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone the repo
git clone https://github.com/karpathy/autoresearch
cd autoresearch

# 3. Install dependencies
uv sync

# 4. Download data & train tokenizer (one-time, ~2 min)
uv run prepare.py

# 5. Test a single training run (~5 min)
uv run train.py

Running Autonomously

# Point your AI agent at the repo, disable all permissions, then prompt:
"Hi have a look at program.md and let's kick off a new experiment!
Let's do the setup first."
⚠️ Mac Users: No NVIDIA = no native support. But there are forks:
Integration

How Would This Work With Your OpenClaw?

Option 1: Separate Compute

Autoresearch needs its own NVIDIA GPU machine. You'd run it on a GPU cloud instance (Lambda Labs, RunPod, etc.) and import the resulting model to Ollama.

Option 2: Future Integration

Not currently integrated — autoresearch is standalone training code. Could theoretically be triggered via OpenClaw, but they're separate systems.

Option 3: The "Research Org"

Customize program.md to define your own research goals. The agent follows your blueprint for what "better" means.

The Path Forward

# If you had an NVIDIA GPU:
1. Run autoresearch overnight → get optimized nanochat model
2. Convert to GGUF format (llama.cpp tools)
3. Import into Ollama: ollama import model.gguf
4. Use in OpenClaw: model: ollama/your-trained-model
Alternatives

What If You Don't Have an NVIDIA GPU?

1. Use Cloud GPU

Lambda Labs, RunPod, Paperspace — rent GPU time for ~$0.50-2/hr

2. Mac Forks

Try autoresearch-mlx for Apple Silicon

3. Manual Tuning

Apply the principles manually — try different hyperparameters in nanoGPT

4. Just Use Ollama

Run pre-trained models, customize with Modelfile — no GPU needed for inference

YouTube Resources

Videos on Autoresearch

Search Terms


🎯 Bottom Line for You

If you had an NVIDIA GPU:

  1. Run autoresearch on a cloud GPU ($1-2/hr)
  2. Wake up to an optimized nanochat model
  3. Import into Ollama
  4. Use in OpenClaw

Without NVIDIA GPU:

  • Use Mac forks (slower, less capable)
  • Or stick with Ollama + Modelfile for now
  • Or rent cloud GPU when you want to experiment