Critical Bug: Claude Code CLI is making excessive background API calls, ignoring model configuration, and console reporting inconsistencies
Critical Bug: Claude Code CLI is making excessive background API calls, ignoring model configuration, and console reporting inconsistencies
Environment
- Platform: Anthropic API
- Claude CLI version: latest
- Operating System: macOS Sequoia 15.0.1 (Darwin 24.0.0), MacBook Pro 14-inch (Nov 2023), Apple M3 Pro, 18GB RAM
- Terminal: Terminal App
Bug Description
CRITICAL SEVERITY - Multiple severe issues have been identified with Claude Code CLI:
- Excessive unauthorized background API calls to Claude 3.5 Haiku despite configuration for Sonnet
- Massive token usage with abnormal input:output ratios (150:1)
- Inefficient cache management loading ~50K tokens per API call
- Non-sequential API logs suggesting race conditions or threading issues
- Time zone inconsistencies in the Anthropic console
- Data discrepancies between logs and usage charts
Steps to Reproduce
- Install Claude Code CLI
- Set up configuration to use Sonnet model (via settings.local.json and environment variables)
- Start using Claude Code CLI for development tasks
- Check Anthropic console logs and usage charts to observe the issues
Expected Behavior
- Claude Code CLI should only make API calls when explicitly triggered by user actions
- Configuration settings for model choice should be respected
- Cache management should be efficient and not reload the entire context with every call
- Logs should be sequential and consistent with usage charts
- Time zones should be consistent across the console interface
Actual Behavior
- Billions of input tokens being consumed monthly with minimal output
- Constant background API calls to Haiku despite explicit Sonnet configuration
- Inefficient cache management reloading ~50K tokens with each API call
- Non-sequential logs suggesting race conditions or threading issues
- Time zone inconsistencies between different parts of the console
- Data discrepancies between logs and usage charts
Time Zone and Data Inconsistencies
There is a confusing mismatch in the Anthropic console:
- The API Logs page shows GMT+1 (UTC+1), which is correct for my local time (BST)
- The API usage chart displays UTC time but labels it as "Europe/London" in the UI
- However, the time shown doesn't match the Logs time of UTC+1
- The token usage I calculated from my logs does not match the usage chart
- When hovering over a usage chart bar (at 18:05 UTC), it shows:
- claude-3-5-haiku-20241022: 1,208
- claude-3-7-sonnet-20250219: 897,918
- Total: 899,126
- However, these numbers don't consistently align with the logs for the corresponding time period
These inconsistencies make it extremely difficult to track, audit, and understand my token usage.
Business Impact
I've committed 100+ hours per week over the last 6 months to immerse myself in AI to build tech products as a non-coder. When Claude Code works with Sonnet 3.7, I make progress. When it switches to Haiku, it cannot perform what would be rudimentary coding tasks for Sonnet.
This issue has resulted in:
- Significant time loss
- Enormous business opportunity cost
- Financial harm through excessive billing
- Delayed product development
- Frustration and loss of productivity
**
**
When examining the token usage details, I found:
- Input: 3 tokens
- Cache Read: 49712 tokens
- Cache Write (5m): 158 tokens
This reveals that Claude Code is reading nearly 50,000 tokens from its memory/context cache for each API call, while my actual input is only 3 tokens. This explains the extreme input:output ratio (150:1) I'm experiencing.
The CLI appears to be:
- Loading tens of thousands of tokens from its cache with every API call
- Charging me for these cache reads as if they were new input tokens
- Only writing a small fraction back to the cache
This inefficient cache management means I'm being charged repeatedly for the same cached data with every interaction. This design flaw is likely the root cause of the billions of input tokens being consumed despite relatively little actual new input from me.
Please confirm receipt of the message ASAP and fix it. NB, the financial aspect is a minor concern relative to the opportunity cost and unwittingly working with a significantly inferior model.
I encountered the same behavior from simple testing tonight. /cost actually costing tokens as it's sending requests to haiku in the background all the time (why??), simple questions using up 14k input tokens for no reason, just insane.
I will be so bold @ryanantonyshaw and suggest that maybe those 100+ hours spent in claude code would have been better used in an IDE learning to code with claude as assistant instead of letting it use up so many tokens :D Half serious joke aside, I can't believe this is how they do things, the financial aspect might be a minor concert to you but not for the majority of people for sure with so many input tokens being used.
Hi! We make background Haiku calls for a variety of reasons, including for security, for backfilling conversation summaries for --resume, and a number of other use cases. This is normal and is how Claude Code works -- lots going on behind the scenes to make the experience nice and safe.
/cost actually costing tokens
/cost runs locally, and does not hit the API. If you're seeing the model hit the API when running /cost, please file a separate issue.
@bcherny Sure it does, I really wonder how much testing went into this, and how little you trust your users. All this "going on behind the scenes" and "background Haiku calls" should not be billed to the customer. Especially something like this:
/costDOES get sent to haiku:
- One word -> $0.05 and more than a dozen thousand tokens used up in cache (why? what? unclear):
Tbh this is really frustrating and I'm sorry, for such an expensive (!!) product, you shouldn't need to ask users to open issues for such obvious flaws.
What is going on here? What are those 2k input tokens sent to haiku on an empty directory with no previous commands?
$0.01 already "spent" by only asking a few times /cost, please don't tell me this is normal and a "feature, not a bug", each /cost command adds a few hundred input tokens and a few dozen output tokens @bcherny
Here you go, maybe that helps https://github.com/anthropics/claude-code/issues/2163 you have it there in a separate issue now!
This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.