claude-code Critical Bug: Claude Code CLI is making excessive background API calls, ignoring model configuration, and console reporting inconsistencies

Critical Bug: Claude Code CLI is making excessive background API calls, ignoring model configuration, and console reporting inconsistencies

Environment

Platform: Anthropic API
Claude CLI version: latest
Operating System: macOS Sequoia 15.0.1 (Darwin 24.0.0), MacBook Pro 14-inch (Nov 2023), Apple M3 Pro, 18GB RAM
Terminal: Terminal App

Bug Description

CRITICAL SEVERITY - Multiple severe issues have been identified with Claude Code CLI:

Excessive unauthorized background API calls to Claude 3.5 Haiku despite configuration for Sonnet
Massive token usage with abnormal input:output ratios (150:1)
Inefficient cache management loading ~50K tokens per API call
Non-sequential API logs suggesting race conditions or threading issues
Time zone inconsistencies in the Anthropic console
Data discrepancies between logs and usage charts

Steps to Reproduce

Install Claude Code CLI
Set up configuration to use Sonnet model (via settings.local.json and environment variables)
Start using Claude Code CLI for development tasks
Check Anthropic console logs and usage charts to observe the issues

Expected Behavior

Claude Code CLI should only make API calls when explicitly triggered by user actions
Configuration settings for model choice should be respected
Cache management should be efficient and not reload the entire context with every call
Logs should be sequential and consistent with usage charts
Time zones should be consistent across the console interface

Actual Behavior

Billions of input tokens being consumed monthly with minimal output
Constant background API calls to Haiku despite explicit Sonnet configuration
Inefficient cache management reloading ~50K tokens with each API call
Non-sequential logs suggesting race conditions or threading issues
Time zone inconsistencies between different parts of the console
Data discrepancies between logs and usage charts

Time Zone and Data Inconsistencies

There is a confusing mismatch in the Anthropic console:

The API Logs page shows GMT+1 (UTC+1), which is correct for my local time (BST)
The API usage chart displays UTC time but labels it as "Europe/London" in the UI
However, the time shown doesn't match the Logs time of UTC+1
The token usage I calculated from my logs does not match the usage chart
When hovering over a usage chart bar (at 18:05 UTC), it shows:
- claude-3-5-haiku-20241022: 1,208
- claude-3-7-sonnet-20250219: 897,918
- Total: 899,126
However, these numbers don't consistently align with the logs for the corresponding time period

These inconsistencies make it extremely difficult to track, audit, and understand my token usage.

Business Impact

I've committed 100+ hours per week over the last 6 months to immerse myself in AI to build tech products as a non-coder. When Claude Code works with Sonnet 3.7, I make progress. When it switches to Haiku, it cannot perform what would be rudimentary coding tasks for Sonnet.

This issue has resulted in:

Significant time loss
Enormous business opportunity cost
Financial harm through excessive billing
Delayed product development
Frustration and loss of productivity

**

When examining the token usage details, I found:

Input: 3 tokens
Cache Read: 49712 tokens
Cache Write (5m): 158 tokens

This reveals that Claude Code is reading nearly 50,000 tokens from its memory/context cache for each API call, while my actual input is only 3 tokens. This explains the extreme input:output ratio (150:1) I'm experiencing.

The CLI appears to be:

Loading tens of thousands of tokens from its cache with every API call
Charging me for these cache reads as if they were new input tokens
Only writing a small fraction back to the cache

This inefficient cache management means I'm being charged repeatedly for the same cached data with every interaction. This design flaw is likely the root cause of the billions of input tokens being consumed despite relatively little actual new input from me.

Please confirm receipt of the message ASAP and fix it. NB, the financial aspect is a minor concern relative to the opportunity cost and unwittingly working with a significantly inferior model.

May 22 '25 15:05 ryanantonyshaw

I encountered the same behavior from simple testing tonight. /cost actually costing tokens as it's sending requests to haiku in the background all the time (why??), simple questions using up 14k input tokens for no reason, just insane.

I will be so bold @ryanantonyshaw and suggest that maybe those 100+ hours spent in claude code would have been better used in an IDE learning to code with claude as assistant instead of letting it use up so many tokens :D Half serious joke aside, I can't believe this is how they do things, the financial aspect might be a minor concert to you but not for the majority of people for sure with so many input tokens being used.

May 26 '25 22:05 arty-hlr

Hi! We make background Haiku calls for a variety of reasons, including for security, for backfilling conversation summaries for --resume, and a number of other use cases. This is normal and is how Claude Code works -- lots going on behind the scenes to make the experience nice and safe.

/cost actually costing tokens

/cost runs locally, and does not hit the API. If you're seeing the model hit the API when running /cost, please file a separate issue.

Jun 16 '25 16:06 bcherny

@bcherny Sure it does, I really wonder how much testing went into this, and how little you trust your users. All this "going on behind the scenes" and "background Haiku calls" should not be billed to the customer. Especially something like this:

/cost DOES get sent to haiku:

One word -> $0.05 and more than a dozen thousand tokens used up in cache (why? what? unclear):

Tbh this is really frustrating and I'm sorry, for such an expensive (!!) product, you shouldn't need to ask users to open issues for such obvious flaws.

Jun 16 '25 18:06 arty-hlr

What is going on here? What are those 2k input tokens sent to haiku on an empty directory with no previous commands?

$0.01 already "spent" by only asking a few times /cost, please don't tell me this is normal and a "feature, not a bug", each /cost command adds a few hundred input tokens and a few dozen output tokens @bcherny

Jun 16 '25 18:06 arty-hlr

Here you go, maybe that helps https://github.com/anthropics/claude-code/issues/2163 you have it there in a separate issue now!

Jun 16 '25 18:06 arty-hlr

This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.

Aug 10 '25 14:08 github-actions[bot]