← Back to workspace

🦊 Karpathy's LLM Research + Ollama Fine-tuning

Deep dive on autoregressive models and local LLM optimization

Background

What is "Autoresearch"?

You likely mean Andrew Karpathy's autoregressive language model projects — a collection of educational tools for understanding how LLMs work under the hood:

📚 makemore

Autoregressive character-level model. Feed it a list of names (or anything), it learns patterns and generates more like them.

GitHub →

🔧 nanoGPT

Simplest/fastest repo for training/finetuning GPT-2 sized models. ~300 lines of training code.

GitHub →

⚡ microGPT

200 lines of pure Python (no dependencies) that trains AND runs inference on a GPT. The ultimate minimal example.

Try it →
Deep Dive

Karpathy's Projects Explained

1. makemore

Takes a text file where each line is one training example (names, company names, words), and generates more like them. Character-level autoregressive.

# Train on your data
python makemore.py -i names.txt -o output

# It supports multiple architectures:
# - Bigram (simple lookup)
# - MLP (neural network)
# - RNN/LSTM/GRU
# - Transformer (GPT-like)

2. nanoGPT

The most practical of the three — lets you finetune actual GPT-2 models on your data. This is what's relevant for your use case.

# Finetune on your data
python train.py config/finetune_shakespeare.py

# Or train from scratch
python train.py config/train_shakespeare_char.py
Key insight: nanoGPT can load pretrained GPT-2 weights from OpenAI and finetune on your data. This is the real path to customizing LLMs.

3. microGPT

Educational — 200 lines that contain everything: dataset, tokenizer, autograd engine, transformer, optimizer, training loop, inference. Runs on CPU.

# Super simple example - training on names
python microgpt.py

# Output - generates new names:
# kamon, ann, karai, jaire, vialan, karia...
Your Setup

How Ollama Fits In

You use Ollama to run open models locally. Here's how it works:

WhatDoes
ollama run llama3Download & chat with a model
REST APIIntegrate into apps (localhost:11434)
ModelfileCustomize prompts & parameters
importLoad custom model weights

Ollama Fine-tuning Options

Current State: As of early 2026, Ollama's native fine-tuning is limited. You can:
  • Use Modelfile to customize system prompts
  • Import models you've trained elsewhere (like from nanoGPT)
  • Run fine-tuned models via llama.cpp format
Integration

How They Could Work Together

Option 1: Train with nanoGPT → Run in Ollama

# 1. Train/fine-tune with nanoGPT
python train.py config/finetune_my_data.py

# 2. Convert to GGUF format (llama.cpp)
# (requires tools like ggml/blob converter)

# 3. Import into Ollama
ollama import model.gguf
# or use Modelfile to point to converted weights

Option 2: Use Ollama as the Inference Engine

# Ollama already has an API
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [{"role": "user", "content": "Hello"}]
}'

Option 3: OpenClaw + Ollama (What You Have)

# Your current setup already uses Ollama
# Just point OpenClaw at your Ollama endpoint:
# model: ollama/minimax-m2.5:cloud

# To improve: try different models, customize with Modelfile
# Or train external and import
The Big Question

Is Fine-tuning Worth It for You?

✅ When It's Worth It

  • You have lots of domain-specific data (legal, medical, technical)
  • You need specific output formats the model won't learn via prompting
  • You're training on gpu (fine-tuning requires significant compute)
  • You want the model to learn new knowledge (not just style)

❌ When It's Probably Not

  • General purpose use (prompting usually works fine)
  • No GPU / limited compute (fine-tuning is slow/expensive)
  • Small dataset (less data = worse results)
  • You just want better answers (try better prompting first)

For Your Use Case (OpenClaw + Ollama)

GoalRecommended Approach
Better coding helpUse Codex/Copilot in IDE + good prompts in OpenClaw
Personal assistantCustom Modelfile + system prompts (simpler than fine-tuning)
Domain expertiseFine-tune if you have GPU + lots of data; else use RAG
Faster inferenceUse quantized models (Q4_K_M, etc.)
💡 Alternative to Fine-tuning: RAG
Instead of changing the model, change what it sees. Use Retrieval-Augmented Generation — give it your documents at query time. This is often better than fine-tuning for knowledge retrieval.
YouTube Resources

Best Videos on This Topic

Search Terms for More


🎯 Bottom Line for You

Given your setup (OpenClaw + Ollama):

  1. Try better prompts first — Often beats fine-tuning
  2. Use Modelfile — Customize behavior without retraining
  3. Try RAG — Feed your documents at query time
  4. Fine-tune as last resort — Only if you have GPU + specific domain data

Karpathy's projects are educational gold — understanding how LLMs work helps you write better prompts and debug issues. But for your use case, you likely don't need to train your own model.