Everything your team needs to understand, use, and get the most out of Claude and AI coding agents — from first principles to production setup.
These four concepts explain how LLMs actually behave. Click any card for a deeper explanation with interactive examples.
The unit of measurement for AI. Every word you type, every line of code it reads — all counted in tokens. A token is roughly ¾ of a word.
The agent's working desk. Everything has to fit on this desk — your rules, history, files, replies. Claude's desk fits ~200,000 tokens (~500 pages).
When the model confidently says something wrong. Not a bug — it predicts plausible-sounding text, not verified facts. It can't tell the difference.
The randomness dial. Turn it down for predictable code. Turn it up for creative writing. Coding agents run near zero — you almost never change this.
Before you type a single prompt, the agent runs a background indexing job. This is what makes "find the calculateTax function" possible without you specifying a file path.
The agent parses every file's AST (Abstract Syntax Tree) and builds a symbol map. All function names, class names, exports, and their exact file + line number are stored in a hash map.
Each file is chunked (function by function), each chunk is run through an embedding model to produce a vector (~1536 numbers), and all vectors are stored in a local vector DB.
The same pipeline — two scenarios. One where you give a file path, one where you don't. See exactly where the difference kicks in.
billing/tax.js and change calculateTax to handle GST"Click any card for a deeper look — including a restaurant analogy version of every concept.
An LLM with tools that can act autonomously in a loop
A child agent spawned for a specific subtask
Standard protocol to connect agents to external services
Functions the LLM can call — read_file, run_bash, search…
Pre-packaged instructions for common jobs
Always-on constraints and policies for the agent
Code that runs before/after every tool call
Slash-command shortcuts for repeated workflows
You're not paying per "request". You pay per token — every character in and out. And it compounds in a way that surprises most people.
Imagine a phone plan that charges per character in every message you send and receive. Each SMS costs money both ways.
Now imagine this: every reply automatically quotes the entire thread above it. So SMS #3 includes the full text of SMS #1 and #2 before your new words.
That's exactly how agent token billing works. Each turn re-sends everything before it. The bill grows fast — not because each turn is expensive, but because they compound.
Practical patterns your team can apply from day one — prompting, modes, parallel agents, and monitoring.
The biggest unlock. Instead of one agent doing everything serially, spawn specialised agents in parallel and save hours. Here's a real example your team can use today.
These two work together, not instead of each other. A Skill holds the knowledge and constraints. A Command is the trigger that invokes it. Think of a Skill as a recipe, and a Command as calling "make pasta".
How to check context fill, token usage, and session history — per tool.
Targeted patterns for what your role actually does every day — beyond the full-stack example.
Regardless of which tool your team uses, every project should have a set of files that tell the agent your rules, your stack, and what it must never touch. Here's what that looks like per tool.
Most real projects span more than one repo. Here's how agents handle that — and how to structure things so they don't lose their mind crossing boundaries.
All packages in one repo. The agent can see everything. The challenge is scope creep — it may edit packages you didn't intend.
Separate repos for frontend, backend, infra. The agent can't see across repos by default. You must explicitly bridge them.
org/engineering-standards repo. Every other repo symlinks or copies it in CI. Changes to standards propagate automatically.Click to expand each answer.
It's both, in a meaningful sense. The LLM has learned deep patterns across billions of lines of code, so it genuinely "understands" common patterns, idioms, and logic. But it's not running your code or building a mental model like a human developer would.
This is why giving it the actual file matters so much — it reads what you give it, not what it imagines your code looks like.
No — only in two situations:
1. You described what you want without naming a file or function
2. The codebase is too large to fit in the context window
billing/tax.js and change calculateTax" → RAG skipped, direct readThey serve different purposes. AST is for precision (you know the name), RAG is for discovery (you know what it does). Production tools layer both: AST lookup first, RAG as fallback.
| Dimension | AST | RAG |
|---|---|---|
| Speed | Sub-millisecond | 50–250ms |
| Exact name match | ✅ Perfect | ⚠️ Can miss |
| Vague / semantic query | ❌ Blind | ✅ Good |
| Cross-file deps | ✅ Tracks imports | ❌ Chunks are isolated |
Three most common causes:
1. Vague instruction: "Fix the auth bug" — the agent guesses which file you mean. Always name the file and function.
2. Wrong chunk retrieved by RAG: If multiple functions match your description, RAG may surface the wrong one. More specific prompts fix this.
3. Hallucination: The agent confidently writes code for an API it hasn't seen. Happens most when the file isn't in its context. Solution: always include the relevant file.
services/auth.ts, the verifyToken function throws when the token is expired instead of returning false. Fix it."Tools are built-in functions the agent framework ships with: read_file, str_replace, run_bash, web_search. They run locally on your machine.
MCP (Model Context Protocol) is a standard that lets you plug in external services using the same tool-call interface. When the agent calls an MCP tool, it's actually sending a request to a remote server (GitHub, Slack, Notion, your internal DB).
read_file("billing/tax.js") → reads from diskgithub.create_pr({...}) → calls GitHub APIFrom the LLM's perspective, both look identical — it just calls a function and gets a result back.
A sub-agent is a child agent spawned by the orchestrator to handle a focused parallel task. Each sub-agent has its own system prompt, context window, and tool access.
billing/tax.jsSub-agents are more common in longer agentic tasks (Claude Code, AutoGPT-style setups) than in simple chat-based code editing.
Yes, completely. The VS Code extension is the agent. It ships its own language server, indexer, local storage, and API client. VS Code is just the shell that hosts it.
The same pipeline runs, the same index is built, the same cost model applies. Whether you open Cursor as a standalone app or install it as a VS Code extension, the agent runtime is identical.
| Tool | Index location | Notes |
|---|---|---|
| GitHub Copilot | No persistent index | LSP in memory only |
| Cursor | AppData\Roaming\Cursor\User\workspaceStorage\<hash>\ | Per workspace |
| Continue.dev | <repo>\.continue\index\ | SQLite — inspectable |
| Codeium | AppData\Roaming\Codeium\<hash>\ | symbol + semantic |
It depends on the tool:
🔴 GitHub Copilot, Cursor, Codeium: Code chunks are sent to their servers for embedding and inference. The resulting vectors are cached locally, but raw code travels to their cloud.
🟢 Continue.dev + local Ollama: Fully air-gapped. Nothing leaves your machine. Embeddings run locally; inference runs locally.
The four things that make the biggest difference:
1. Name the exact file: billing/tax.js not "the billing file"
2. Name the exact function: calculateTax not "the tax function"
3. Describe the current behaviour: "currently returns amount * rate"
4. Describe the desired behaviour: "should use GST formula (amount * rate) / (1 + rate) when taxType is 'GST'"
billing/tax.js, update calculateTax(amount, rate) — add a third param taxType='VAT'. When taxType is 'GST', use formula (amount * rate) / (1 + rate). Keep existing VAT behaviour."All four are optional customisations. Here's when you'd use each:
Skills — write a SKILL.md file if there's a task type the agent should always handle a certain way (e.g. "whenever creating a DOCX, read this template first"). Most teams won't write their own skills initially.
Rules — write a .claude/CLAUDE.md (or similar) to set org-wide constraints: "never commit to main", "always run tests before responding", "use our internal logger not console.log". Start here — even a few rules make a big difference.
Hooks — scripts that run before/after tool calls. Use for logging, safety checks, auto-formatting, or triggering CI. Requires some setup; not needed on day one.
Commands — shortcut prompts in .claude/commands/. Define /fix-tests or /add-type-safety once, run them with one word. Great for repeated workflows.
Imagine a fine-dining restaurant. Every role, every object, every process maps directly to how AI agents work. Once you see it, you can't unsee it.
You walk into a fine-dining restaurant. You are the developer. You tell the head waiter what you want. What happens next — from kitchen to table — is exactly how an AI agent processes your instruction.
The head waiter takes your order (your prompt), decides the full sequence of steps needed, coordinates everyone, and doesn't stop until your meal is served. They don't cook — they direct. They reason, plan, and loop until the job is done.
You order the tasting menu. The head waiter dispatches three specialist chefs simultaneously — one at the grill, one at the pastry station, one at the bar. Each has their own station (context window) and handles their task in parallel. The head waiter assembles the final result when all three are done.
The physical equipment that does the actual work. The chef decides what to use — but the kitchen assistant physically operates it.
str_replaceread_filerun_bashweb_searchThe restaurant has relationships with external suppliers — meat farm, dairy, vegetable market, wine cellar. MCP is the standardised ordering system that connects to all of them with one interface. Every supplier speaks the same language. The kitchen doesn't need a different phone for each one.
Each chef has their specialty training — the pastry chef knows to always temper chocolate at 32°C and never use margarine. The grill chef knows to rest the steak 5 minutes before serving. These aren't universal rules — they're role-specific expertise. Without the recipe book, they'd still try — but might not meet your restaurant's specific standards.
The poster on the kitchen wall. Every member of staff follows these — the head waiter, the grill chef, the pastry chef, the bartender. No exceptions, no matter how busy. Never use expired ingredients. Always wash hands. Serve hot food above 63°C. These are your CLAUDE.md rules — always on, always enforced.
The automatic events that fire when something happens — nobody has to remember to trigger them. When a dish is plated, the pass bell rings automatically. When the fridge goes above temperature, the alarm fires. When service ends, the log sheet is filled automatically.
Things the customer can say that trigger a defined kitchen workflow — without needing to understand how the kitchen works. "Extra salt" → exactly the right process kicks off. "Spice level 3" → kitchen knows what that means. The customer doesn't specify each step — they just say the command.