Logo Vincent
Back to all posts

Claude Code /cost Explained: How Much Is Your AI Coding Really Costing?

Claude
Claude Code /cost Explained: How Much Is Your AI Coding Really Costing?

Why You Need /cost

When developing with Claude Code, what do you care about most?

Besides code quality and efficiency, it’s probably money.

If you’re an API key user, every conversation generates charges. A complex debugging session, a large-scale refactoring — how much did it actually cost? You might be surprised when you check.

If you’re a Max/Pro subscriber, you’re not billed per call, but you surely want to know how much of your quota you’ve used and whether you’re approaching the limit.

/cost is the command that keeps you informed about your spending.

What Is /cost

/cost is Claude Code’s session cost viewer command. It displays the cumulative API call expenses, time consumption, and code change statistics for the current session.

In interactive mode, type:

/cost

You’ll see output like this:

Total cost:            $0.4832
Total duration (API):  3 mins 12 secs
Total duration (wall): 8 mins 45 secs
Total code changes:    156 lines added, 43 lines removed
Usage by model:
  claude-opus-4-6:    12,450 input, 3,280 output, 45,600 cache read, 8,200 cache write ($0.42)
  claude-sonnet-4-6:  2,100 input, 890 output ($0.02)

Everything at a glance.

Who Can See /cost

There’s an important distinction here:

API Key Users

If you access Claude Code with an API key, /cost displays the full cost breakdown. Every call is being billed, and you need to know how much you’re spending.

Max/Pro Subscribers

If you’re a Claude.ai subscriber, the /cost command is hidden by default. Since you’re on a subscription plan, you’re not billed per call.

When you type /cost, you’ll see:

You are currently using your subscription to power your Claude Code usage

Or if your subscription quota is exhausted and you’re using overage:

You are currently using your overages to power your Claude Code usage.
We will automatically switch you back to your subscription rate limits when they reset

Want to see detailed costs? Setting the DISABLE_COST_WARNINGS=false environment variable won’t change this behavior, as the cost display logic for subscribers is independent. This feature is primarily for API key users.

How Costs Are Calculated

This is where it gets technical. Claude Code has a complete model pricing table built in, calculating costs in real-time based on official API prices.

Model Pricing (Per Million Tokens)

ModelInputOutputCache ReadCache Write
Haiku 3.5$0.80$4$0.08$1.00
Haiku 4.5$1$5$0.10$1.25
Sonnet 4.x$3$15$0.30$3.75
Opus 4 / 4.1$15$75$1.50$18.75
Opus 4.5$5$25$0.50$6.25
Opus 4.6$5$25$0.50$6.25
Opus 4.6 (fast)$30$150$3.00$37.50

Two key takeaways:

  1. Opus 4.6 fast mode costs 6x more. That’s the price of /fast — same model, faster output, but 6x the per-token cost. So /fast is great for short tasks, not for lengthy operations.

  2. Cache reads cost only 10% of input price. Claude Code heavily uses Prompt Cache (system prompts, CLAUDE.md, etc.), meaning repeated context costs very little.

The Formula

Cost = Input tokens x Input price
     + Output tokens x Output price
     + Cache read tokens x Cache read price
     + Cache write tokens x Cache write price
     + Web search requests x $0.01

After each API call completes, Claude Code extracts the actual token usage from the response, multiplies by the corresponding model’s pricing, and adds it to the session total.

What /cost Displays

The /cost output covers five dimensions:

1. Total Cost

Cumulative API call expenses for the current session in USD, precise to four decimal places.

If the session used models not in the pricing table (like an internal beta model), a warning is shown: costs may be inaccurate.

2. API Duration

The sum of pure API call time. Excludes your thinking time and tool execution time — only counts the time Claude spends “reasoning.”

3. Wall Duration

The actual wall-clock time since the session started. The gap between API duration and wall duration represents time spent outside of waiting for Claude.

4. Code Changes

Lines of code added and removed during the current session. A tangible measure of “what this conversation produced.”

5. Usage by Model

Token usage broken down by model: input, output, cache read, cache write, and cost subtotal per model.

If you switched models during the session (e.g., Opus for analysis, then Sonnet for simple edits), each appears separately.

Cost Persistence

Claude Code saves cost data to the project configuration, including:

  • Total cost amount
  • Total API call duration
  • Per-model token usage breakdown
  • Session ID (to determine if it’s the same session)

This means if you use /resume to restore a previous session, cost data is restored too, maintaining continuity.

Note: persistence is per-session. New sessions start from zero.

Practical Usage Tips

Check Regularly

Build a habit of typing /cost periodically during long sessions. Especially when:

  • Large-scale refactoring (lots of code context = lots of tokens)
  • Iterative debugging (many conversation turns = accumulated costs)
  • Using max effort (longer reasoning chains = more output tokens)

Watch the Cache Ratio

If cache read tokens far exceed input tokens, Prompt Cache is doing its job and your costs are optimized.

Conversely, if cache reads are low, your CLAUDE.md or system prompts might be too short to fully leverage caching.

Fast Mode Cost Awareness

Opus 4.6 in /fast mode costs 6x more than normal mode. If you’re just making a simple change with fast mode on, the cost may not be worth it. Recommendations:

  • Short task + fast: Quick completion, total cost stays manageable
  • Long task + normal mode: Slower, but saves significantly

/cost vs /usage vs /stats

These three commands are easy to confuse. Here’s the difference:

CommandTarget UsersWhat It Shows
/costAPI key usersActual USD spending
/usageSubscribersSubscription quota and rate limits
/statsAll usersSession statistics and activity metrics

Simple rule: spending goes to /cost, quota goes to /usage, data goes to /stats.

Money-Saving Tips

Now that you know how costs are calculated, here are some practical ways to save:

  1. Use /compact wisely: Compress when context gets too long, reducing input tokens for subsequent calls
  2. Use /effort wisely: Use low or medium for simple tasks — don’t let Claude overthink
  3. Leverage caching: Write a good CLAUDE.md so common context uses cache reads (only 10% of input price)
  4. Be cautious with /fast: Unless time is truly critical, normal mode offers better value
  5. ! prefix: Use the exclamation prefix for commands that don’t need AI involvement — zero API calls

Final Thoughts

The core value of /cost is transparency.

AI coding assistants aren’t a free lunch. Every conversation, every token has a cost. /cost lets you see these costs, enabling smarter decisions — when to go all-in and when to economize.

Know what you’re spending, and you’ll know it’s worth it.

More Articles

© 2026 vincentqiao.com . All rights reserved.