Free handbook for developers · Open source

AI Agents,
demystified

Everything your team needs to understand, use, and get the most out of Claude and AI coding agents — from first principles to production setup.

01
4 Pillars of LLMs
02
How indexing works
03
Prompt → file change
04
8 key concepts
05
How cost works
06
Dev workflows
07
FE & BE playbooks
08
Project setup
Foundations

4 things to understand
before anything else

These four concepts explain how LLMs actually behave. Click any card for a deeper explanation with interactive examples.

🪙
Click to explore

Tokens

The unit of measurement for AI. Every word you type, every line of code it reads — all counted in tokens. A token is roughly ¾ of a word.

QUICK EXAMPLES
"Hi"1 token
"calculateTax"3 tokens
"authentication"4–5 tokens
Short message~20 tokens
200-line file~2,000 tokens
3-turn agent loop~18,000 tokens
💡 You pay per token — in and out. Agent loops get expensive fast because every turn re-sends everything.
🪟
Click to explore

Context Window

The agent's working desk. Everything has to fit on this desk — your rules, history, files, replies. Claude's desk fits ~200,000 tokens (~500 pages).

System prompt + CLAUDE.md~2k tok
Your messages + historygrows
Files the agent read~2k each
Tool outputs (tests, search)varies
Agent's responsesgrows
💡 Every turn re-sends the entire desk. A long session quietly costs 5× more than a fresh one.
👻
Click to explore

Hallucination

When the model confidently says something wrong. Not a bug — it predicts plausible-sounding text, not verified facts. It can't tell the difference.

🔴 High riskInternal APIs it hasn't seen, file paths it guessed, version-specific syntax
🟡 Medium riskFacts about your business logic, DB schemas, third-party integrations
🟢 Low riskTransforming code it can directly see in the context window
💡 Always give the agent the actual file. If it can see it, it can't hallucinate it.
🌡️
Click to explore

Temperature

The randomness dial. Turn it down for predictable code. Turn it up for creative writing. Coding agents run near zero — you almost never change this.

DeterministicCreative
0.00.51.0
Code agent (Claude Code)0.1 – 0.3
General chat / Q&A0.5 – 0.7
Creative writing / brainstorm0.8 – 1.0
💡 Tools set this automatically. You don't need to touch it unless building custom agents.
What happens first

When you open a codebase
for the first time

Before you type a single prompt, the agent runs a background indexing job. This is what makes "find the calculateTax function" possible without you specifying a file path.

PATH 1 — AST Symbol Index

The agent parses every file's AST (Abstract Syntax Tree) and builds a symbol map. All function names, class names, exports, and their exact file + line number are stored in a hash map.

calculateTax → billing/tax.js:14
UserService → services/user.ts:3
formatCurrency → utils/fmt.js:22
Lookup is a hash map — sub-millisecond. Updated live via file watcher on every save.
PATH 2 — Semantic / RAG Index

Each file is chunked (function by function), each chunk is run through an embedding model to produce a vector (~1536 numbers), and all vectors are stored in a local vector DB.

chunk: calculateTax fn
vector: [0.021, -0.843, …1536]
stored in: index.sqlite / chroma
🔍
Used when you describe what you want without knowing the name. ~100ms per query.
What runs on first open — indexing pipeline
CODEBASE billing/tax.js File Parser AST + chunks Symbol extractor function names + lines Embedding model chunk → 1536 floats Symbol index Hash map on disk Vector DB SQLite / Chroma Agent ready queries fast
⏱️
How long does indexing take? A 50k-line codebase takes ~10–30 seconds for AST indexing, and ~2–5 minutes for embedding-based RAG (because it calls an embedding API per chunk). AST indexing is always done locally. RAG indexing may send chunks to an external API.
Step by step

From your prompt
to the file change

The same pipeline — two scenarios. One where you give a file path, one where you don't. See exactly where the difference kicks in.

📎 With file path
🔍 Without file path
✏️
Prompt: "Take billing/tax.js and change calculateTax to handle GST"
Pipeline — file path given (fast path)
STEP 1 Your prompt billing/tax.js STEP 2 Orchestrator parses intent SKIPS RAG ✓ Direct read_file ("billing/tax.js") STEP 3 LLM sees code full file in context STEP 4 Emits tool call str_replace JSON TOOL CALL JSON (LLM output — framework executes this) { tool: "str_replace_in_file" , path: "billing/tax.js" old_str: "function calculateTax(amount, rate) { return amount * rate; }" new_str: "function calculateTax(amount, rate, taxType='VAT') { ... }" } Step 5 — Framework writes file to disk LLM never touches filesystem directly Step 6 — Optional: run_tests, lint, respond Multi-turn loop if tests fail
🔎
Prompt: "Fix the function that handles tax calculation" — no file path given
Pipeline — no file path (RAG/AST discovery path)
STEP 1 Your prompt no path given STEP 2 Orchestrator needs to find file STEP 3 — FILE DISCOVERY AST: exact name? hash map lookup (<1ms) OR RAG: embed query → similarity search returns top-k relevant code chunks (~100ms) STEP 4 read_file() full file in context STEP 5 LLM reasons emits str_replace STEP 6 File written + verify / loop ⚠️ This path takes 100–300ms longer due to search, and has slightly more risk of finding the wrong file
💬
Practical advice: When you know the filename or function name, always include it. It bypasses the entire discovery step and is faster, cheaper, and more precise.
Core concepts

8 things your team
needs to know

Click any card for a deeper look — including a restaurant analogy version of every concept.

🤖

Agent

An LLM with tools that can act autonomously in a loop

Core concept
🔀

Sub-agent

A child agent spawned for a specific subtask

Delegation
🔌

MCP

Standard protocol to connect agents to external services

Integration
🛠️

Tools

Functions the LLM can call — read_file, run_bash, search…

Execution

Skills

Pre-packaged instructions for common jobs

You write these
📏

Rules

Always-on constraints and policies for the agent

You write these
🪝

Hooks

Code that runs before/after every tool call

You write these

Commands

Slash-command shortcuts for repeated workflows

You define these
Economics

How cost actually works

You're not paying per "request". You pay per token — every character in and out. And it compounds in a way that surprises most people.

📱 Phone bill analogy
🔧 Technical breakdown

Think of it like an SMS plan where you pay per character

Imagine a phone plan that charges per character in every message you send and receive. Each SMS costs money both ways.

Now imagine this: every reply automatically quotes the entire thread above it. So SMS #3 includes the full text of SMS #1 and #2 before your new words.

That's exactly how agent token billing works. Each turn re-sends everything before it. The bill grows fast — not because each turn is expensive, but because they compound.

⚠️
The trap: A task that loops 10 times costs ~5× more than the same task that loops 3 times. Not just 3× more — the compounding makes it worse. Always cap your agent's steps.
Your agent session — billed like SMS
TURN 1 — YOU SEND
"Fix calculateTax in billing/tax.js"
system + your message3,050 tok →
TURN 2 — INCLUDES EVERYTHING BEFORE IT
[full turn 1 quoted] + [file contents read]
turn 1 + file (2k lines)6,250 tok →
TURN 3 — EVEN MORE
[turns 1+2 quoted] + [test output]
turns 1+2 + test results6,900 tok →
Total billed (3 turns) 16,200 tokens
For one small function edit. Felt like "one request."
Day-to-day guide

How to actually use
this in your work

Practical patterns your team can apply from day one — prompting, modes, parallel agents, and monitoring.

✍️

Writing better prompts

THE ANATOMY OF A GOOD PROMPT
① FILE
Exact path
billing/tax.js not "the billing file"
② FUNCTION
Exact name
calculateTax not "the tax function"
③ CURRENT
What it does now
Describe the current behaviour
④ DESIRED
What it should do
Be specific about the new behaviour
❌ WEAK prompt
"Fix the tax function to handle GST"
Agent must guess: Which file? Which function? What's broken? What does GST mean in your context?
Risk: wrong file, hallucinated formula, unnecessary refactor.
✅ STRONG prompt
"In billing/tax.js, update
calculateTax(amount, rate)
add 3rd param taxType='VAT'.
When taxType==='GST' use
(amount*rate)/(1+rate).
Keep existing VAT behaviour.
Run tests after."
MORE QUICK TIPS
Scope it down
One task per prompt. "Fix the tax function AND refactor the whole billing module" leads to sprawl.
Say what not to touch
"Don't change the function signature" or "don't modify the test file" prevents unwanted changes.
Include the constraint
Add "without adding new dependencies" or "keeping TypeScript strict mode" so the agent doesn't import random packages.
🎛️

Plan mode · Ask mode · Auto mode

💬 ASK MODE
Chat only — no file writes
The agent answers, explains, and suggests — but never touches your files. Use when you want its opinion, not its hands.
Good for
• Code reviews
• "Is this approach good?"
• Architecture questions
• Explaining unfamiliar code
claude --ask "explain this regex"
📋 PLAN MODE
Shows the plan before doing
Agent reads all relevant files, thinks through the full approach, and presents a numbered plan for your approval before touching anything.
Good for
• Risky refactors
• Multi-file changes
• Tasks you want to review first
• Onboarding to unfamiliar code
claude --plan "refactor auth module"
⚡ AUTO MODE
Full autonomous execution
Agent acts end-to-end: reads files, makes changes, runs tests, loops until done. No approval steps. Fastest path for well-defined tasks.
Good for
• Clear, scoped tasks
• Tasks with good test coverage
• Repetitive work (add types, fix lint)
• After you've used Plan mode once
claude "add error handling to login"
💡 Rule of thumb: Use Ask when you're exploring. Use Plan when you're uncertain. Use Auto when the task is clear and tests exist.
🔀

Multi-agent workflows

The biggest unlock. Instead of one agent doing everything serially, spawn specialised agents in parallel and save hours. Here's a real example your team can use today.

SCENARIO — New API: POST /api/orders/create
YOU "Build POST /orders/create" ORCHESTRATOR AGENT Designs API contract schema · Spawns 3 agents BACKEND AGENT • POST /api/orders/create • Validation, DB write, response • Error handling, status codes FRONTEND AGENT • useCreateOrder() hook • Loading/error/success states • Matches exact contract schema QA AGENT • Integration tests • Mocked backend scenarios • Edge cases: 400, 422, 500 ALL THREE RUN IN PARALLEL Orchestrator merges · Tests pass · Done ✓
❌ SERIAL (old way)
Design contract     ~30 min
Build backend      ~2 hrs
Build frontend     ~1.5 hrs
Write QA tests     ~1 hr
Total: ~5 hours
✅ PARALLEL (multi-agent)
Design contract     ~5 min
BE + FE + QA (parallel) ~25 min
Review + merge    ~10 min
 
Total: ~40 minutes
HOW TO ACTUALLY PROMPT THIS
# Step 1: Plan the contract first
"Design the API contract for POST /api/orders/create.
Output a JSON schema for request body, success response, and error responses.
Do not write any code yet."

# Step 2: Spawn parallel agents with the contract
"Using this contract: [paste schema]
Spawn three sub-agents in parallel:
1. Backend agent: implement POST /api/orders/create in src/api/orders.ts
2. Frontend agent: implement useCreateOrder hook in src/hooks/useCreateOrder.ts
3. QA agent: write integration tests in tests/orders.test.ts
All three must strictly follow the contract schema above."

# Step 3: Validate
"Run all tests. If anything fails, fix and re-run."

Skills & Commands — how they fit together

These two work together, not instead of each other. A Skill holds the knowledge and constraints. A Command is the trigger that invokes it. Think of a Skill as a recipe, and a Command as calling "make pasta".

HOW THEY RELATE
⭐ SKILL FILE
Contains the how:
• Library constraints
• Validation steps
• Output format
• Org-specific patterns
skills/code-review.md

invoked by
⚡ COMMAND
Contains the trigger:
• One-line prompt
• References the skill
• Optionally takes args
• Exposed as /shortcut
commands/code-review.md
# commands/code-review.md — the trigger
Read the skill at skills/code-review.md, then review the
file at $ARGUMENTS following those exact constraints.

# Developer types:
/code-review src/api/orders.ts

# Agent reads the skill, applies all constraints, reviews the file
Write a Skill when
• The task has org-specific constraints
• Output must follow your patterns
• Specific libraries must be used
• Validation steps are required
Add a Command when
• You run the same task repeatedly
• You want a /shortcut for the team
• You want to expose it with args
• Works with OR without a skill
Just use a prompt when
• It's a one-off task
• No special org rules apply
• You don't do it repeatedly
• It's simple enough to describe inline
📊

Monitoring usage & context

How to check context fill, token usage, and session history — per tool.

Claude Code
Cursor
GitHub Copilot
Generic / API
CONTEXT FILL
claude /context
Shows tokens used vs window size
TOKEN USAGE
claude usage
Input/output tokens today
RESUME SESSION
claude --continue
Pick up where you left off
# Compact context when getting large
claude /compact

# Cap how many steps it can take
claude --max-turns 5 "fix failing tests"

# See every tool call in detail
claude --verbose
0–40%
Healthy
Good to go. Plenty of room.
40–70%
Fine
Normal for active sessions.
70–85%
Watch out
Summarise or trim context
85%+
New session
Quality drops. Start fresh.
Specialist playbooks

Frontend & Backend
engineer guides

Targeted patterns for what your role actually does every day — beyond the full-stack example.

FRONTEND ENGINEER
What you can offload
Component generation
Generate full components from a description — with shadcn primitives, Tailwind classes, TypeScript props, and loading/error/empty states included.
"Create an OrderCard component using shadcn Card.
Props: order: Order, onCancel: () => void.
Show status badge, total, date. Use our color tokens."
Types from API response
Paste a raw JSON response and get TypeScript interfaces, Zod schemas, and a typed fetch hook in one go.
"Here's the API response: [paste JSON].
Generate: TypeScript interface, Zod schema,
and a useOrders() hook with error handling."
Accessibility audit
Give it a component, get ARIA fixes, keyboard nav, and focus management suggestions back.
"Review src/components/Modal.tsx for a11y.
Fix missing aria labels, focus trap, and
keyboard close. Don't change visual styles."
Storybook stories
Point it at a component and it writes all stories — default, loading, error, empty, edge cases.
"Read src/components/OrderCard.tsx.
Write Storybook stories covering: default,
cancelled state, loading skeleton, empty."
Libraries to mention in prompts
shadcn/ui Tailwind CSS Zod React Query Zustand React Hook Form Vitest Playwright
Mention these explicitly so the agent never reaches for a different library. Put them in your CLAUDE.md.
BACKEND ENGINEER
What you can offload
API route + validation
Full route with Zod validation, error handling, typed response, and status codes — from a single description.
"In src/api/orders.ts, add POST /orders.
Validate body with Zod. Return 201 on success,
422 on validation fail, 500 on DB error.
Use our ApiError class for all errors."
DB migrations — with caution
Generate migration files from schema diffs. Always review before running. Never let the agent run migrations automatically.
"Read prisma/schema.prisma. Generate a
migration to add nullable 'notes' field to Order.
Write the migration file only — do NOT run it."
OpenAPI / Swagger spec
Give it your route files and get a full OpenAPI 3.0 spec back. Share with frontend immediately — no manual docs.
"Read all files in src/api/. Generate an
OpenAPI 3.0 spec covering all routes,
request bodies, and response schemas."
Service layer + unit tests
Write the business logic service and its unit tests together. Tests go in the same pass — not as an afterthought.
"Create OrderService in src/services/.
Methods: createOrder, cancelOrder, getByUser.
Write unit tests alongside using Vitest.
Mock the Prisma client."
Libraries to mention in prompts
Prisma Zod Hono / Express Vitest Jose (JWT) Redis BullMQ Pino (logging)
Put your actual stack in CLAUDE.md so the agent never invents an alternative.
⚠️
Never let the agent touch: production DB credentials, .env files, migration runs, or deploy scripts autonomously. Always require human review for anything that touches data at rest.
Your org's standard

Project AI setup that
enforces good code

Regardless of which tool your team uses, every project should have a set of files that tell the agent your rules, your stack, and what it must never touch. Here's what that looks like per tool.

🧪
Test-driven
A testing skill forces tests alongside every new function
🔒
Secure
An ignore file keeps .env and secrets out of the agent's context
📐
Consistent
A code-writing guide means every dev's output looks the same
♻️
Reviewable
A review skill gives every PR the same automated checklist
Claude Code
Cursor
GitHub Copilot
Continue.dev
Generic / Any tool
PROJECT STRUCTURE — CLAUDE CODE
your-project/
  ├── CLAUDE.md                      ← rules, stack, conventions (root level)
  ├── .claude/
    │   ├── commands/                 ← /fix-tests, /code-review, /add-types
    │   ├── skills/                    ← code-writing.md, testing.md, review.md
    │   └── settings.json            ← MCP servers, tool permissions
  ├── .gitignore                      ← also controls what Claude Code doesn't index
CLAUDE.md
Root-level rules file — agent reads this automatically
# Stack: Next.js, TypeScript strict, Prisma
# UI: shadcn/ui + Tailwind

## Always
- Use shadcn, never raw HTML elements
- Write tests for every new function
- Use Pino logger, not console.log

## Never
- Add npm packages without asking
- Use `any` in TypeScript
- Read or write .env files
.claude/settings.json
Tool permissions + MCP server connections
{
 "permissions": {
  "allow": ["Bash(npm test)", "Read"],
  "deny": ["Bash(git push)"
 }}
 "mcpServers": {
  "github": { "url": "..." },
  "slack": { "url": "..." }
 }
}
⭐ SKILL / PROMPT MARKETPLACE
Don't write everything from scratch
The community maintains pre-built instruction files for common tasks — DOCX generation, data analysis, frontend design, PDF handling, code review patterns and more. Grab one, adapt it to your stack.
docx · pdf · xlsx · pptx
frontend-design · data-analysis
file-reading · code-review
+ community on GitHub
Multi-repo strategies

Working across
multiple repositories

Most real projects span more than one repo. Here's how agents handle that — and how to structure things so they don't lose their mind crossing boundaries.

MONOREPO

All packages in one repo. The agent can see everything. The challenge is scope creep — it may edit packages you didn't intend.

my-monorepo/
  packages/
    frontend/  ← CLAUDE.md here
    backend/   ← CLAUDE.md here
    shared/    ← CLAUDE.md here
  CLAUDE.md    ← root rules (all packages)
💡
Nest CLAUDE.md files per package. Root sets global rules. Package-level files add specifics. Agent respects the nearest one.
MULTI-REPO

Separate repos for frontend, backend, infra. The agent can't see across repos by default. You must explicitly bridge them.

org/frontend  ← clone locally
org/backend   ← clone locally
org/shared    ← clone locally

# Open all three in same agent session
# or use sub-agents per repo
ℹ️
The agent's context window is per-session. Opening multiple repos in one session gives cross-repo awareness but increases token cost.

Strategies for common cross-repo tasks

1
API contract change (affects FE and BE)
Use the orchestrator + sub-agent pattern. Orchestrator designs the new contract. Two sub-agents run in parallel: one updates the backend route, one updates the frontend hook. Both get the contract schema passed in context.
# Step 1: design the contract (no code yet)
"Design the new schema for POST /orders. JSON only, no code."

# Step 2: spawn sub-agents with the schema
"Using [schema], update backend in ./backend
AND frontend hook in ./frontend simultaneously"
2
Shared library update (used by 3+ repos)
Update the shared package first. Then use MCP (GitHub) to open a PR in each consuming repo with the dependency bump + any required usage changes. One agent, one session, MCP does the cross-repo operations.
# MCP enables cross-repo operations
"Update shared/auth-utils v2.1.
Open PRs in org/frontend, org/backend,
org/mobile bumping to v2.1 each."
# Agent uses github MCP for each PR
3
Cross-repo code review / audit
Clone all relevant repos locally. Open one agent session across all directories. Ask it to audit a specific pattern (e.g. "find all places we call the deprecated auth API") across repos at once.
# All repos cloned under ./repos/
"Search all files under ./repos/ for calls to
auth.verifyToken(). List file, line, and repo.
I want to replace all with auth.verify() v2."
SHARED CLAUDE.MD PATTERN FOR MULTI-REPO ORGS
What goes in the shared root CLAUDE.md
• Org-wide rules (no any, no main commits)
• Shared library names and versions
• Auth patterns and security rules
• Logging standards across all services
• Environment variable naming conventions
What goes in each repo's CLAUDE.md
• Repo-specific tech stack and framework
• Local testing commands
• Service-specific patterns
• Which MCP servers are relevant
• Local .claudeignore additions
💡
Tip: Store the shared CLAUDE.md in a central org/engineering-standards repo. Every other repo symlinks or copies it in CI. Changes to standards propagate automatically.
Frequently asked

Questions your team
will definitely ask

Click to expand each answer.

Does the agent actually understand my code, or is it just pattern matching?

It's both, in a meaningful sense. The LLM has learned deep patterns across billions of lines of code, so it genuinely "understands" common patterns, idioms, and logic. But it's not running your code or building a mental model like a human developer would.

Think of it like a very experienced developer who reads fast: they can understand your code from the text alone — they don't need to run it. But they can still miss subtle runtime behaviour or domain-specific logic that isn't obvious from the text.

This is why giving it the actual file matters so much — it reads what you give it, not what it imagines your code looks like.

Is RAG always involved when an agent edits a file?

No — only in two situations:

1. You described what you want without naming a file or function
2. The codebase is too large to fit in the context window

You: "Take billing/tax.js and change calculateTax" → RAG skipped, direct read
You: "Fix the tax calculation bug" → RAG/AST needed to find the right file first
AST or RAG — which should I rely on?

They serve different purposes. AST is for precision (you know the name), RAG is for discovery (you know what it does). Production tools layer both: AST lookup first, RAG as fallback.

DimensionASTRAG
SpeedSub-millisecond50–250ms
Exact name match✅ Perfect⚠️ Can miss
Vague / semantic query❌ Blind✅ Good
Cross-file deps✅ Tracks imports❌ Chunks are isolated
Why does the agent sometimes edit the wrong thing?

Three most common causes:

1. Vague instruction: "Fix the auth bug" — the agent guesses which file you mean. Always name the file and function.

2. Wrong chunk retrieved by RAG: If multiple functions match your description, RAG may surface the wrong one. More specific prompts fix this.

3. Hallucination: The agent confidently writes code for an API it hasn't seen. Happens most when the file isn't in its context. Solution: always include the relevant file.

Better: "In services/auth.ts, the verifyToken function throws when the token is expired instead of returning false. Fix it."
What's the difference between a Tool and an MCP server?

Tools are built-in functions the agent framework ships with: read_file, str_replace, run_bash, web_search. They run locally on your machine.

MCP (Model Context Protocol) is a standard that lets you plug in external services using the same tool-call interface. When the agent calls an MCP tool, it's actually sending a request to a remote server (GitHub, Slack, Notion, your internal DB).

Built-in tool: read_file("billing/tax.js") → reads from disk
MCP tool: github.create_pr({...}) → calls GitHub API

From the LLM's perspective, both look identical — it just calls a function and gets a result back.

What is a sub-agent and when does it appear?

A sub-agent is a child agent spawned by the orchestrator to handle a focused parallel task. Each sub-agent has its own system prompt, context window, and tool access.

Orchestrator: "Change calculateTax and make sure all callers are updated"
→ Sub-agent 1: Update billing/tax.js
→ Sub-agent 2: Find all callers in codebase
→ Sub-agent 3: Update tests
All run in parallel; orchestrator assembles results.

Sub-agents are more common in longer agentic tasks (Claude Code, AutoGPT-style setups) than in simple chat-based code editing.

Does a VS Code extension behave the same as a standalone install?

Yes, completely. The VS Code extension is the agent. It ships its own language server, indexer, local storage, and API client. VS Code is just the shell that hosts it.

The same pipeline runs, the same index is built, the same cost model applies. Whether you open Cursor as a standalone app or install it as a VS Code extension, the agent runtime is identical.

Where do index files live on Windows?
ToolIndex locationNotes
GitHub CopilotNo persistent indexLSP in memory only
CursorAppData\Roaming\Cursor\User\workspaceStorage\<hash>\Per workspace
Continue.dev<repo>\.continue\index\SQLite — inspectable
CodeiumAppData\Roaming\Codeium\<hash>\symbol + semantic
Does my code leave my machine when the agent runs?

It depends on the tool:

🔴 GitHub Copilot, Cursor, Codeium: Code chunks are sent to their servers for embedding and inference. The resulting vectors are cached locally, but raw code travels to their cloud.

🟢 Continue.dev + local Ollama: Fully air-gapped. Nothing leaves your machine. Embeddings run locally; inference runs locally.

For sensitive repos (financial data, internal IP, customer data) — use Continue.dev with a local model, or review your vendor's data processing agreement before connecting the repo.
How do I write a good prompt for a code change?

The four things that make the biggest difference:

1. Name the exact file: billing/tax.js not "the billing file"

2. Name the exact function: calculateTax not "the tax function"

3. Describe the current behaviour: "currently returns amount * rate"

4. Describe the desired behaviour: "should use GST formula (amount * rate) / (1 + rate) when taxType is 'GST'"

Weak: "Fix the tax function to handle GST"
Strong: "In billing/tax.js, update calculateTax(amount, rate) — add a third param taxType='VAT'. When taxType is 'GST', use formula (amount * rate) / (1 + rate). Keep existing VAT behaviour."
What are Skills, Rules, Hooks, and Commands — do I need to write them?

All four are optional customisations. Here's when you'd use each:

Skills — write a SKILL.md file if there's a task type the agent should always handle a certain way (e.g. "whenever creating a DOCX, read this template first"). Most teams won't write their own skills initially.

Rules — write a .claude/CLAUDE.md (or similar) to set org-wide constraints: "never commit to main", "always run tests before responding", "use our internal logger not console.log". Start here — even a few rules make a big difference.

Hooks — scripts that run before/after tool calls. Use for logging, safety checks, auto-formatting, or triggering CI. Requires some setup; not needed on day one.

Commands — shortcut prompts in .claude/commands/. Define /fix-tests or /add-type-safety once, run them with one word. Great for repeated workflows.

Recommended order to adopt: Rules first → Commands next → Hooks when needed → Skills for specialised tasks
The restaurant analogy
All 8 concepts — one
restaurant, one story

Imagine a fine-dining restaurant. Every role, every object, every process maps directly to how AI agents work. Once you see it, you can't unsee it.

🏰
The scene

You walk into a fine-dining restaurant. You are the developer. You tell the head waiter what you want. What happens next — from kitchen to table — is exactly how an AI agent processes your instruction.

🤵
Agent = Head Waiter

The head waiter takes your order (your prompt), decides the full sequence of steps needed, coordinates everyone, and doesn't stop until your meal is served. They don't cook — they direct. They reason, plan, and loop until the job is done.

👨‍🍳
Sub-agents = Specialist Chefs

You order the tasting menu. The head waiter dispatches three specialist chefs simultaneously — one at the grill, one at the pastry station, one at the bar. Each has their own station (context window) and handles their task in parallel. The head waiter assembles the final result when all three are done.

🥩 Grill chef → writes backend code 🍰 Pastry chef → writes frontend 🍹 Bartender → writes tests
🔪
Tools = Kitchen Equipment

The physical equipment that does the actual work. The chef decides what to use — but the kitchen assistant physically operates it.

🔪 Knife = str_replace
📖 Recipe book = read_file
🔥 Oven = run_bash
📞 Phone = web_search
🚚
MCP = Supplier Network

The restaurant has relationships with external suppliers — meat farm, dairy, vegetable market, wine cellar. MCP is the standardised ordering system that connects to all of them with one interface. Every supplier speaks the same language. The kitchen doesn't need a different phone for each one.

🥩 MeatFarm MCP → GitHub 🥛 Dairy MCP → Slack 🍷 WineCellar MCP → Notion
📒
Skills = Each Chef's Recipe Book

Each chef has their specialty training — the pastry chef knows to always temper chocolate at 32°C and never use margarine. The grill chef knows to rest the steak 5 minutes before serving. These aren't universal rules — they're role-specific expertise. Without the recipe book, they'd still try — but might not meet your restaurant's specific standards.

🪧
Rules = Health & Safety Poster

The poster on the kitchen wall. Every member of staff follows these — the head waiter, the grill chef, the pastry chef, the bartender. No exceptions, no matter how busy. Never use expired ingredients. Always wash hands. Serve hot food above 63°C. These are your CLAUDE.md rules — always on, always enforced.

🔔
Hooks = Automatic Kitchen Signals

The automatic events that fire when something happens — nobody has to remember to trigger them. When a dish is plated, the pass bell rings automatically. When the fridge goes above temperature, the alarm fires. When service ends, the log sheet is filled automatically.

🔔 Order ready bell → post-write hook: auto-lint the file
🌡️ Temperature alarm → safety hook: block writes to .env
📋 End-of-service log → on-complete hook: create PR
📣
Commands = Customer Requests

Things the customer can say that trigger a defined kitchen workflow — without needing to understand how the kitchen works. "Extra salt" → exactly the right process kicks off. "Spice level 3" → kitchen knows what that means. The customer doesn't specify each step — they just say the command.

/fix-tests /code-review /add-types
Star on GitHub
Thanks! Sharing helps others find this 🙌