Blueprint
Product thesis and MVP scope
Let AI enthusiasts train and chat with their own small LLMs without managing infrastructure, at a fraction of the cost of using large API models.
Target user
AI students, indie researchers, and startups who need to train small language models for education, prototyping, or niche applications and want a no‑ops experience.
MVP scope
- User account with OAuth (GitHub/Google) and basic profile.
- Submit training job: configure depth and dataset, view cost estimate, start run.
- Asynchronous training execution on a GPU-backed worker, with real-time progress and logs.
- Inference endpoint (`/api/chat`) for conversing with a trained model via a simple text‑based UI.
- Stripe integration for prepaid credit purchases (no subscription yet).
- Web UI with dashboard (list jobs/models), training config form, and chat window.
- Model checkpoint download link (for users wanting to export their model).
System
Entities, flows, and modules
Entities
User
id, email, oauth_provider, oauth_id, credit_balance_cents, created_at — Auth is missing in original repo; this is the foundational entity for paid service.
TrainingJob
id, user_id, config_json (depth, dataset_url, etc.), status (queued/running/completed/failed), started_at, cost_cents, output_model_path, log_blob — Replaces the local `--depth` CLI; must track per‑user job lifecycle.
Model
id, training_job_id, checkpoint_s3_key, tokenizer_file_s3_key, created_at — Each successful job produces a model available for chat.
ChatSession
id, user_id, model_id, messages_json, created_at — Stores conversation history for the chat UI.
BillingTransaction
id, user_id, stripe_event_id, amount_cents, type (charge, credit_purchase), description, created_at — Usage charges deducted per job; credit purchases via Stripe.
Flows
Dashboard
GET / — Show logged‑in user's recent training jobs, active models, and remaining credits.
Training job creation
GET & POST /train — Form to set depth (e.g., 26 for GPT‑2 scale), upload or link a dataset, see cost prediction, and submit job.
Job status / logs
GET /jobs/{job_id} — Live progress, estimated time remaining, and ability to cancel or download model if complete.
Chat with model
GET /chat/{model_id} — WebSocket‑powered chat UI that sends prompts to `/api/chat` and streams tokens back (SSE).
Account & billing
GET /account — View transaction history, buy credits via Stripe checkout, manage API keys (if needed).
Backend modules
TrainingWorker
Spawn a GPU container, run nanochat’s `gpt.py` training loop with user config, upload checkpoints to S3, update job status. — Core value; must be reliable and isolated to prevent resource contention.
InferenceService
Load model from S3, run generation on-demand or keep warm for chat, expose `/api/chat`. — Post‑training interaction is why users stay; latency matters.
UserService
Handle OAuth, store user profiles, manage credit balances, and authorize actions. — Without it, no paid accounts exist; critical for billing tightly coupled.
BillingService
Integrate Stripe, calculate job cost (GPU‑seconds × rate), deduct credits, handle webhooks. — Monetization cannot start without usage tracking and payment collection.
StorageManager
Abstract S3 operations for model checkpoints, tokenizer files, and temporary uploads. — Separates concerns and simplifies moving to different object stores later.
Phases
Implementation phases
Phase 1 – Core training & local packaging
Make nanochat’s training deterministic and Runnable as a Docker container, consuming a config file and outputting a model checkpoint.
Deliverables
- `train_runner.py` wrapping nanochat’s `gpt.py` and `train.py`, driven by JSON config.
- Dockerfile with PyTorch CUDA base, nvcc, and nanochat dependencies.
- Logic to upload final checkpoint + tokenizer to S3-compatible storage.
- CLI integration test on a single GPU instance (e.g., AWS p3.2xlarge).
Exit criteria
- A `--depth 26` training run completes reliably in the container, producing a checkpoint under 2 hours on 8xH100 equivalent (simulated).
- Checkpoint loads correctly in an inference harness and generates coherent text.
Phase 2 – Service plumbing
Stand up the web backend with user accounts, job queue, and rudimentary API.
Deliverables
- User model + OAuth endpoints (GitHub) in FastAPI.
- TrainingJob API: POST /api/jobs creates a Celery task; GET /api/jobs/{id} returns status.
- Redis + Celery GPU worker that downloads dataset from S3, runs `train_runner`, uploads result.
- Basic dashboard HTML showing user’s jobs.
Exit criteria
- A new user can sign up, submit a training job, see it transition to “completed”, and download the model.
Phase 3 – Billing & chat frontend
Add payment, a chat UI, and the minimum lovable user experience.
Deliverables
- Stripe integration: prepaid credit purchases (Stripe Checkout), webhook to top‑up balance.
- Job cost calculator: estimate price before submission; deduct credits on launch.
- React or Svelte frontend with /dashboard, /train, /chat/{model_id} views.
- Chat UI that streams tokens from `/api/chat` using Server‑Sent Events.
- Model list page and download links.
Exit criteria
- User can buy credits, train a model, and chat with it entirely via the web UI.
- No credit is deducted if job fails due to infrastructure; correct hot‑swapping of failed GPUs.
Phase 4 – Hardening & observability
Make the service reliable for a public beta and easy to deploy.
Deliverables
- Kubernetes manifests (Deployment, Service, Ingress) for API, workers, and frontend.
- Structured logging (JSON to stdout) and Prometheus metrics (job queue depth, GPU utilization).
- CI/CD pipeline (GitHub Actions) to build and push Docker images.
- Multi‑tenancy safeguard: separate S3 prefixes per user, API‑key auth for future API access.
- Rate limiting and abuse detection (max concurrent jobs per user).
Exit criteria
- Zero‑downtime deployment on a managed K8s cluster with auto‑scaling workers.
- Alerts for job failures, credit overdrafts, or GPU spot‑instance interruptions.
Build first
Build first, skip first, and watchouts
First things to build
- Reproduce nanochat’s training loop as a standalone CLI script accepting `--depth` and dataset path, outputting a checkpoint file.
- Write a lightweight inference server (`inference_server.py`) that loads that checkpoint and exposes `/api/chat` via FastAPI.
- Create a `User` model with `credits` field and OAuth endpoints (GitHub only, for speed).
- Integrate Stripe Checkout for buying credits – a single `POST /api/credits/purchase` that redirects.
- Implement job queue: Celery task that calls the training CLI in a subprocess and updates PostgreSQL status.
- Dockerfile that bundles training script + PyTorch, and a `docker-compose.yml` to test locally with fake GPU.
Not to build yet
- Multi‑GPU distributed training – nanochat’s single‑node focus is the wedge; scaling later.
- Finetuning for downstream tasks (e.g., instruction tuning) – start with pretraining only.
- Full‑featured dataset management UI – support only URL to a text file or an S3 prefix.
- Enterprise SSO or team accounts – not needed for initial users.
- Automatic model evaluation/benchmarking pipelines – user can evaluate themselves.
Risks / blockers
- GPU spot‑instance interruptions may cause half‑completed jobs and cost overruns if not handled gracefully.
- Token smearing bug (#756) and other accuracy issues could erode user trust if models produce garbled output.
- Cost of cloud GPUs (e.g., H100) could make the $2‑3/hour price point unprofitable if utilization is low.
- Missing commercial plumbing (email verification, observability, CI) increases launch risk and operational toil.
- Competitors like Hugging Face Spaces or Modal may offer similar services with less friction, capitalizing on nanochat’s popularity.
Builder prompts
Derived builder prompts
Master
Master context prompt
Expand prompt
You are building a fresh implementation inspired by the karpathy/nanochat repository.
Treat the repository as a reverse-engineering reference, not as the default destination codebase.
Infer the product, architecture, entities, and flows from the reference repository, then rebuild the core system intentionally from scratch.
Do not blindly clone the original repo. Do not default to patching or refactoring it in place.
Build in small phases and keep the first version focused and maintainable.
Product thesis: Let AI enthusiasts train and chat with their own small LLMs without managing infrastructure, at a fraction of the cost of using large API models.
Commercial/product framing: The massive traction of nanochat (54k stars) proves demand for a hackable, low-cost LLM training platform. Capturing this audience with a turnkey hosted service unlocks a paying user base willing to spend small amounts for convenience.
Target user: AI students, indie researchers, and startups who need to train small language models for education, prototyping, or niche applications and want a no‑ops experience.
MVP scope:
- User account with OAuth (GitHub/Google) and basic profile.
- Submit training job: configure depth and dataset, view cost estimate, start run.
- Asynchronous training execution on a GPU-backed worker, with real-time progress and logs.
- Inference endpoint (`/api/chat`) for conversing with a trained model via a simple text‑based UI.
- Stripe integration for prepaid credit purchases (no subscription yet).
- Web UI with dashboard (list jobs/models), training config form, and chat window.
- Model checkpoint download link (for users wanting to export their model).
Stack assumptions:
- Python 3.10+ with PyTorch and `torch.compile` for training/inference on a single NVIDIA GPU.
- FastAPI backend serving REST APIs (auth, training, chat).
- React (or Svelte) frontend with lightweight chat component.
- PostgreSQL for user data, job state, and billing transaction records.
- Celery + Redis for scheduling and executing training jobs asynchronously.
- Docker for packaging training and inference workers, with Kubernetes for orchestration.
- AWS S3 or compatible object store for model checkpoints and tokenizer files.
Key entities:
- User: id, email, oauth_provider, oauth_id, credit_balance_cents
- TrainingJob: id, user_id, config_json (depth, dataset_url, etc.), status (queued/running/completed/failed), started_at
- Model: id, training_job_id, checkpoint_s3_key, tokenizer_file_s3_key, created_at
- ChatSession: id, user_id, model_id, messages_json, created_at
- BillingTransaction: id, user_id, stripe_event_id, amount_cents, type (charge, credit_purchase)
Core pages / routes / flows:
- Dashboard (GET /): Show logged‑in user's recent training jobs, active models, and remaining credits.
- Training job creation (GET & POST /train): Form to set depth (e.g., 26 for GPT‑2 scale), upload or link a dataset, see cost prediction, and submit job.
- Job status / logs (GET /jobs/{job_id}): Live progress, estimated time remaining, and ability to cancel or download model if complete.
- Chat with model (GET /chat/{model_id}): WebSocket‑powered chat UI that sends prompts to `/api/chat` and streams tokens back (SSE).
- Account & billing (GET /account): View transaction history, buy credits via Stripe checkout, manage API keys (if needed).
Backend services / modules:
- TrainingWorker: Spawn a GPU container, run nanochat’s `gpt.py` training loop with user config, upload checkpoints to S3, update job status.
- InferenceService: Load model from S3, run generation on-demand or keep warm for chat, expose `/api/chat`.
- UserService: Handle OAuth, store user profiles, manage credit balances, and authorize actions.
- BillingService: Integrate Stripe, calculate job cost (GPU‑seconds × rate), deduct credits, handle webhooks.
- StorageManager: Abstract S3 operations for model checkpoints, tokenizer files, and temporary uploads.
Do not build yet:
- Multi‑GPU distributed training – nanochat’s single‑node focus is the wedge; scaling later.
- Finetuning for downstream tasks (e.g., instruction tuning) – start with pretraining only.
- Full‑featured dataset management UI – support only URL to a text file or an S3 prefix.
- Enterprise SSO or team accounts – not needed for initial users.
- Automatic model evaluation/benchmarking pipelines – user can evaluate themselves.
Major risks / blockers:
- GPU spot‑instance interruptions may cause half‑completed jobs and cost overruns if not handled gracefully.
- Token smearing bug (#756) and other accuracy issues could erode user trust if models produce garbled output.
- Cost of cloud GPUs (e.g., H100) could make the $2‑3/hour price point unprofitable if utilization is low.
- Missing commercial plumbing (email verification, observability, CI) increases launch risk and operational toil.
- Competitors like Hugging Face Spaces or Modal may offer similar services with less friction, capitalizing on nanochat’s popularity.
Work in phases. After each phase, summarize what was built, what remains, and which assumptions from the reference repo you intentionally did or did not keep.Phase 1
Phase 1 foundation prompt
Expand prompt
Phase 1 foundation Objective: Make nanochat’s training deterministic and Runnable as a Docker container, consuming a config file and outputting a model checkpoint. Deliverables: - `train_runner.py` wrapping nanochat’s `gpt.py` and `train.py`, driven by JSON config. - Dockerfile with PyTorch CUDA base, nvcc, and nanochat dependencies. - Logic to upload final checkpoint + tokenizer to S3-compatible storage. - CLI integration test on a single GPU instance (e.g., AWS p3.2xlarge). - Exit: A `--depth 26` training run completes reliably in the container, producing a checkpoint under 2 hours on 8xH100 equivalent (simulated). - Exit: Checkpoint loads correctly in an inference harness and generates coherent text. Rules: - Start by reverse engineering the product shape from the reference repo, then create a clean fresh project structure. - Set up the minimum schema, services, routes, and project structure required for the MVP. - Avoid polish work, avoid optional abstractions, and avoid copying implementation details you do not understand. Start by reverse engineering the reference repository, then implement the fresh build in the smallest clean sequence.
Phase 2
Phase 2 core flow prompt
Expand prompt
Phase 2 core flow
Objective: Stand up the web backend with user accounts, job queue, and rudimentary API.
Deliverables:
- User model + OAuth endpoints (GitHub) in FastAPI.
- TrainingJob API: POST /api/jobs creates a Celery task; GET /api/jobs/{id} returns status.
- Redis + Celery GPU worker that downloads dataset from S3, runs `train_runner`, uploads result.
- Basic dashboard HTML showing user’s jobs.
- Exit: A new user can sign up, submit a training job, see it transition to “completed”, and download the model.
Rules:
- Implement the core pages, routes, and backend modules needed for the happy path in the fresh build.
- Use the reference repo for behavior and architecture cues, not as the code target.
- Preserve maintainability and add only the minimum comments needed.
Start by reverse engineering the reference repository, then implement the fresh build in the smallest clean sequence.Phase 3
Phase 3 polish/refactor prompt
Expand prompt
Phase 3 polish and refactor
Objective: Add payment, a chat UI, and the minimum lovable user experience.
Deliverables:
- Stripe integration: prepaid credit purchases (Stripe Checkout), webhook to top‑up balance.
- Job cost calculator: estimate price before submission; deduct credits on launch.
- React or Svelte frontend with /dashboard, /train, /chat/{model_id} views.
- Chat UI that streams tokens from `/api/chat` using Server‑Sent Events.
- Model list page and download links.
- Exit: User can buy credits, train a model, and chat with it entirely via the web UI.
- Exit: No credit is deducted if job fails due to infrastructure; correct hot‑swapping of failed GPUs.
Rules:
- Refine the fresh implementation without introducing a large refactor.
- Improve validation, edge cases, and developer ergonomics where the core flow already exists.
- Prefer targeted cleanup over architecture churn or feature creep.
Start by reverse engineering the reference repository, then implement the fresh build in the smallest clean sequence.Deploy
Deploy/finalization prompt
Expand prompt
Deploy and finalization Objective: Prepare the implementation for release, verification, and handoff. Deliverables: - Verify the main flows work. - Check configuration, environment assumptions, and release blockers. - Summarize remaining risks and suggested follow-up work. Rules: - Do not add speculative infrastructure. - Focus on practical release readiness, tests, and explicit known gaps versus the reference repo. Start by reverse engineering the reference repository, then implement the fresh build in the smallest clean sequence.
Bugfix
Bugfix/refinement prompt
Expand prompt
Bugfix and refinement Objective: Fix the specific bug or implementation gap in the fresh build while preserving the established architecture. Deliverables: - Reproduce the issue in the rebuilt implementation. - Patch the smallest reliable fix. - Explain what caused it, how to verify the fix, and whether the reference repo suggests an architecture guardrail you missed. Rules: - Inspect the fresh implementation before editing. - Do not rewrite working systems just to fix a localized issue. Start by reverse engineering the reference repository, then implement the fresh build in the smallest clean sequence.