AI agents that autonomously improve LLM training — self-improving AI
Karpathy's autoresearch is an autonomous AI research framework. You point an AI agent (Claude, Codex, etc.) at the repo, and it:
This is literally AI self-improvement — the agent researches better ways to train itself.
| File | What It Is | Who Modifies |
|---|---|---|
prepare.py | Data prep, tokenizer training (fixed) | Nobody |
train.py | Model, optimizer, training loop | AI Agent |
program.md | Agent instructions (the "skill") | You |
1. Agent reads program.md for instructions
2. Agent edits train.py (hyperparams, architecture, etc.)
3. Runs training for exactly 5 minutes
4. Checks val_bpb (validation bits per byte)
5. If improved → keep changes
If worse → revert
6. Repeat ~12 times/hour
7. Wake up to optimized model
# 1. Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone the repo
git clone https://github.com/karpathy/autoresearch
cd autoresearch
# 3. Install dependencies
uv sync
# 4. Download data & train tokenizer (one-time, ~2 min)
uv run prepare.py
# 5. Test a single training run (~5 min)
uv run train.py
# Point your AI agent at the repo, disable all permissions, then prompt:
"Hi have a look at program.md and let's kick off a new experiment!
Let's do the setup first."
Autoresearch needs its own NVIDIA GPU machine. You'd run it on a GPU cloud instance (Lambda Labs, RunPod, etc.) and import the resulting model to Ollama.
Not currently integrated — autoresearch is standalone training code. Could theoretically be triggered via OpenClaw, but they're separate systems.
Customize program.md to define your own research goals. The agent follows your blueprint for what "better" means.
# If you had an NVIDIA GPU:
1. Run autoresearch overnight → get optimized nanochat model
2. Convert to GGUF format (llama.cpp tools)
3. Import into Ollama: ollama import model.gguf
4. Use in OpenClaw: model: ollama/your-trained-model
Lambda Labs, RunPod, Paperspace — rent GPU time for ~$0.50-2/hr
Try autoresearch-mlx for Apple Silicon
Apply the principles manually — try different hyperparameters in nanoGPT
Run pre-trained models, customize with Modelfile — no GPU needed for inference
If you had an NVIDIA GPU:
Without NVIDIA GPU: