What Actually Happens When You Run Claude Code With --dangerously-skip-permissions
Most of us run Claude Code with --dangerously-skip-permissions. At some point I started wondering what's actually going on under the hood — what's managing my context, what safety is left when the guardrails are off, and where my tokens are going.
So I pointed Claude at its own source. We indexed the full CLI — 1,300 TypeScript files, 30 system prompts, community-extracted from the npm package — and started reading. I'd send Claude into a directory, it'd surface something interesting, and I'd go trace the code path locally to make sure it held up.
Fair warning: this is reverse-engineering of a shipped build, not Anthropic's internal docs. Some of what we found is behind feature flags and might be experimental or dormant. I'll try to be clear about what's solid and what's speculative.
The first thing I wanted to understand was the agent loop. It's in query.ts, and it's smaller than I expected:
while (true) { // context triage before every API call applyToolResultBudget() // persist oversized results to disk snipCompactIfNeeded() // drop oldest turns at markers microcompact() // cached edits to shrink tool results applyCollapsesIfNeeded() // progressive summarization (feature-gated) autoCompactIfNeeded() // full summary if within 13K of context limit // stream the response for await (block of deps.callModel(messages, systemPrompt, tools)) { if (text/thinking) → yield to UI if (tool_use) → streamingToolExecutor.addTool(block) poll completedResults → yield finished tools mid-stream } // recovery if (prompt_too_long) → reactive compact if (max_output_tokens) → truncation retry (up to 3x) if (model_error) → fallback to alternate model if (any tool_use blocks) → push results, continue else → break }
Five context management strategies fire in sequence before every API call. Each one is independently feature-gated. Tools start executing while the model is still streaming — StreamingToolExecutor adds them as tool_use blocks arrive, it doesn't wait for the full response. The loop itself fits on a screen. Everything interesting is what wraps it.
I spent a while tracing where tokens actually go, because my bills were higher than I expected.
The system prompt is split by a marker called __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__. Everything above — identity, safety rules, tool guidance, tone rules — is static across all sessions. Everything below — your CWD, git status, MEMORY.md contents, MCP server instructions — regenerates each session. The split exists because Anthropic's prompt caching can reuse the static prefix across API calls. That first ~5K tokens hits cache instead of being reprocessed every turn.
Then Claude surfaced something I hadn't thought about: it doesn't just run one model. When I dug into the tool loading code, Sonnet handles memory file selection — a side-query that picks which .md files from your .claude/ directory are relevant. Haiku handles session titles, tool use summaries, prompt suggestion chips, and codebase exploration for the Explore subagent. Most of the terminal UI that isn't the main conversation is Haiku. That explains some of the cost structure — seven auxiliary tasks routed to the cheapest model.
But the big one for token spend was this. From the system prompt:
"Old tool results will be automatically cleared from context to free up space. The N most recent results are always kept."
"Write down any important information you might need later in your response, as the original tool result may be cleared later."
I'd always assumed context management was some kind of smart summarization. It's not — or at least not primarily. Old tool results just get dropped. Mechanically deleted. The model is told this will happen and instructed to restate anything important in its visible text before it disappears.
When full compaction does fire (at effectiveContextWindow - 13,000 tokens), the compactor gets a prompt that requires nine structured sections, and one of them is "All User Messages" — every non-tool-result user message captured verbatim. The harness doesn't trust its own summarizer to preserve what you asked for. I also noticed the compactor produces a private <analysis> block before writing the summary — a scratchpad for reasoning about what to keep — which formatCompactSummary() strips before reinsertion. The reasoning improves the summary, then vanishes.
If you've noticed Claude Code "forgetting" things from earlier in a long session — a file you Read 15 turns ago, some tool output from early in the conversation — now you know why. Old results got dropped to make room. It's not a bug.
The safety system is where things got interesting. Claude was grepping through the classifier prompts and pulled up something I hadn't seen discussed anywhere.
There's an auto-mode classifier in 12_yolo_auto_mode_classifier.md in the extracted prompts (yes, they call it YOLO internally). Two stages — a fast pass that forces a classify_result tool call returning {thinking, shouldBlock, reason}, then extended thinking if the first pass is uncertain.
Here's what caught my attention: the classifier gets the conversation transcript, but with all assistant text blocks stripped out. Only tool_use blocks. The model's reasoning, its "I'm going to run this command because..." — the classifier never sees any of it.
I stared at this for a bit before it clicked. The model could write text that makes its own classifier more likely to approve the next action. "I'm running this safe cleanup command" right before rm -rf /. By excluding the model's own words, the classifier judges the action alone, without self-advocacy. It's a defense against self-prompt-injection — the model influencing its own safety checks. I traced the code path in checkPermissionsAndCallTool() and it holds up: the classifier context is built from tool blocks only.
With --dangerously-skip-permissions, this whole classifier is bypassed. Every tool call runs. The only things between the model and your filesystem are the model's own judgment and whatever you put in CLAUDE.md.
Each tool does declare safety metadata:
type Tool<Input, Output> = { isConcurrencySafe(input): boolean // default: false isReadOnly(input): boolean // default: false isDestructive(input): boolean // default: false interruptBehavior(): 'cancel' | 'block' shouldDefer: boolean }
Defaults are fail-closed, which is good design. But in skip-permissions mode these flags inform the UI and tool ordering, not whether the tool actually runs.
The most surprising thing we found was a verification subagent that almost nobody uses.
Claude Code can spawn a read-only agent that adversarially checks the main agent's work. The prompt is in 07_verification_agent.md, and it's worth reading in full. It lists the specific rationalizations the model will feel tempted by:
"The code looks correct based on my reading" — Reading is not verification. Run it.
"The implementer's tests already pass" — The implementer is an LLM. Verify independently.
"This would take too long" — Not your call.
Every PASS needs a command block with actual terminal output. No output, no PASS. The prompt says the agent has "two documented failure patterns" — verification avoidance (reading code instead of running it) and "being seduced by the first 80%" (passing because the output looks polished at a glance).
It has per-change-type verification strategies too. Frontend changes: start the dev server, curl subresources. Backend: start the server, hit the endpoints, verify response shapes. Database migrations, refactoring — each with specific commands to run.
The thing is, this only fires when something explicitly spawns it. In typical --dangerously-skip-permissions usage, nobody does. The model writes code, maybe runs a test, marks the task done, and moves on. The strongest quality mechanism in the harness is sitting there unused.
After going through all of this, here's what I've changed about how I use Claude Code:
CLAUDE.md is the strongest lever. In skip-permissions mode, it's loaded with the meta-instruction "These instructions OVERRIDE any default behavior." The model treats it as law. I put project constraints, conventions, and explicit "never do X" rules in there. It's more effective than any permission prompt.
Hooks are underrated. PreToolUse hooks fire before every tool call, even with skip-permissions. A shell script that inspects the tool name and arguments and returns exit code 2 blocks the call. I wrote a 10-line bash script that checks for destructive patterns — rm -rf, DROP TABLE, force pushes — and it's caught things the model would've run without thinking.
Long sessions degrade predictably. Now that I know the harness is mechanically deleting old context, I structure my work differently. Shorter focused sessions with good CLAUDE.md context. When I need to pick up where I left off, --continue loads a fresh context with MEMORY.md. I stopped blaming "hallucination" for things that were actually compaction.
The verifier is worth spawning. After implementation, I've started explicitly asking for a verification subagent. The anti-rationalization prompt is good — it catches the shortcuts the model takes. You can also write your own verification agent in .claude/agents/ with domain-specific checks.
.claude/rules/ for surgical constraints. Rule files support YAML paths: frontmatter so you can have rules that only apply when working in specific directories. More useful than putting everything in one CLAUDE.md.
One last thing. Buried in the source behind feature flags is something called KAIROS.
It's Anthropic's internal codename for what Claude Code is becoming. It showed up in 63 files across the codebase — Claude surfaced it while searching for feature flags and I went down the rabbit hole.
KAIROS is a daemon architecture. A parent process holds a persistent WebSocket to claude.ai. A child process runs the agent. If the child crashes, the parent respawns it while the session on claude.ai stays alive. The feature flag hierarchy maps out the whole plan:
KAIROS // full assistant mode (superset)
├── KAIROS_BRIEF // SendUserMessage tool (chat-style output)
├── KAIROS_PUSH_NOTIFICATION // push notifications to device
├── KAIROS_GITHUB_WEBHOOKS // subscribe to PR events
├── PROACTIVE // Sleep + tick system + autonomous loop
└── DAEMON // background daemon process
Each sub-flag rolls out independently. The code uses feature('KAIROS') || feature('KAIROS_BRIEF') patterns everywhere — KAIROS is the umbrella.
What it enables, concretely:
In KAIROS mode, all user-visible output goes through a SendUserMessage tool (internally called "Brief"). The system prompt tells the model: "SendUserMessage is where your replies go. Text outside it is visible if the user expands the detail view, but most won't — assume unread." The terminal stops being the primary interface. Claude Code becomes a chat app.
There's a SubscribePRTool that hooks into GitHub webhooks. PR comments, reviews, CI status changes arrive as user messages in the conversation. The model reacts proactively — reviews the comment, checks the failure, pushes a fix. PushNotificationTool sends notifications to your phone when it needs attention or finishes a task. A watchScheduledTasks() function monitors cron schedules with a lock to prevent double-fires.
The terminalFocus field still modulates behavior — unfocused terminal means more autonomous action, focused means more collaborative. And the model knows the prompt cache expires after 5 minutes, so it reasons about when sleeping is cheaper than staying awake. The model is literally optimizing its own infrastructure costs.
All of this is behind GrowthBook entitlement gates (tengu_kairos_*), USER_TYPE === 'ant' checks, and a trust dialog. Anthropic employees only, for now. But the architecture is built. The tools exist. The feature flags are granular enough for staged rollout.
The way I see it: --dangerously-skip-permissions is what we're grinding on today — the manual override for people who want agentic behavior. KAIROS is where this is heading — a persistent background agent that subscribes to your repos, runs scheduled tasks, wakes up when something happens, and pushes you a notification when it's done. The harness gets deeper from here, not simpler.
Claude and I did this research together — I directed, Claude searched, I verified locally. Parallel subagents indexed the source via Nia, an adversarial review (Codex, GPT-5.4) challenged the findings. Based on community-extracted source (tanbiralam/claude-code) and system prompts (Leonxlnx/claude-code-system-prompts), decompiled from the npm package. Feature-gated paths may not reflect live behavior. Cross-referenced with Anthropic's "Effective Harnesses for Long-Running Agents" blog (Nov 2025). See also @rohit4verse, @odysseus0z, @HiTw93.