claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[Bug] Excessive token usage on new session initialization

Open sambua opened this issue 2 weeks ago • 26 comments

Bug Description [report] could you please explain me what you did with new cleared session, with "Current session" 31% and "Weekly limits" 4%. Guys it's robbery. I've just started session, only one message for planning, only small part of the project and vala unbelievable usage of tokens. Very bad!!!!

It was think off, but Opus 4.5 by fault. But it can't use this one just creating one plan file with 620 line of text (including code examples)

Environment Info

  • Platform: darwin
  • Terminal: iTerm.app
  • Version: 2.0.64
  • Feedback ID: 7e3c88dc-8f71-47e6-aa73-23fdd3696399

Errors

[{"error":"Error\n    at xw (/$bunfs/root/claude:189:1143)\n    at <anonymous> (/$bunfs/root/claude:190:10080)\n    at emit (node:events:92:22)\n    at endReadableNT (internal:streams/readable:861:50)\n    at processTicksAndRejections (native:7:39)\n    at request (/$bunfs/root/claude:192:2147)\n    at processTicksAndRejections (native:7:39)","timestamp":"2025-12-10T07:19:08.470Z"},{"error":"Error\n    at xw (/$bunfs/root/claude:189:1143)\n    at <anonymous> (/$bunfs/root/claude:190:10080)\n    at emit (node:events:92:22)\n    at endReadableNT (internal:streams/readable:861:50)\n    at processTicksAndRejections (native:7:39)\n    at request (/$bunfs/root/claude:192:2147)\n    at processTicksAndRejections (native:7:39)","timestamp":"2025-12-10T07:19:08.721Z"},{"error":"ConfigParseError: Invalid schema: name: Marketplace name cannot impersonate official Anthropic/Claude marketplaces. Names containing \"official\", \"anthropic\", or \"claude\" in official-sounding combinations are reserved.\n    at BBB (/$bunfs/root/claude:1361:869)\n    at CuR (/$bunfs/root/claude:1361:2934)\n    at async Ok (/$bunfs/root/claude:1363:601)\n    at async cH_ (/$bunfs/root/claude:4478:5773)\n    at processTicksAndRejections (native:7:39)","timestamp":"2025-12-10T07:19:13.089Z"},{"error":"Error: Request was aborted.\n    at _createMessage (/$bunfs/root/claude:128:3151)\n    at processTicksAndRejections (native:7:39)","timestamp":"2025-12-10T07:28:43.041Z"}]

sambua avatar Dec 10 '25 07:12 sambua

Found 3 possible duplicate issues:

  1. https://github.com/anthropics/claude-code/issues/12333
  2. https://github.com/anthropics/claude-code/issues/13532
  3. https://github.com/anthropics/claude-code/issues/13326

This issue will be automatically closed as a duplicate in 3 days.

  • If your issue is a duplicate, please close it and 👍 the existing issue instead
  • To prevent auto-closure, add a comment or 👎 this comment

🤖 Generated with Claude Code

github-actions[bot] avatar Dec 10 '25 07:12 github-actions[bot]

Same here

yambu avatar Dec 10 '25 15:12 yambu

Same here, I just started my new 5-hour session, and I did only "/compact" and checked the usage to find it spent 9% !!!!

The previous session was consumed so freaking fast, and I felt that there was something wrong because I never reach near the limit, but I said maybe I slept on it. But now, it has become obvious that there is some sort of a bug in the way the session limit is being calculated.

Please fix.

mosabalhsseini avatar Dec 10 '25 16:12 mosabalhsseini

Same here. Every time I JUST OPEN it consumes 4%

rbarcante avatar Dec 10 '25 21:12 rbarcante

+1

dannydanzka avatar Dec 11 '25 02:12 dannydanzka

Today the same with version 2.0.64, do nothing at all, just exit, enter typed /context +2%, type custom command name, stopped immediately +2% and Current session 4% for nothing. it's bug and should be fixed @claude

sambua avatar Dec 11 '25 03:12 sambua

Absolutely the same happens to me. Before 2.0.64 with Max plan for me it was impossible (based on my coding tasks/style) to hit the session (5h) limit. After 2.064 - I hit the limit in3.5 hours and was proposed to upgrate plane to bigger capacity. @claude

IlliaVern avatar Dec 11 '25 09:12 IlliaVern

seems @claude want to rob us showing big income, before go to IPO

sambua avatar Dec 11 '25 10:12 sambua

As a temporary solution, try to downgrade the version to claude install 2.0.62 and after claude is started, check "Thinking" mode is disabled (for me it always starts with Thinking mode on even with "alwaysThinkingEnabled": false in settings.json)

eoris avatar Dec 11 '25 10:12 eoris

@eoris did the Claude install 2.0.62 but it's stills showing as 2.0.65, is there a catch?

rbarcante avatar Dec 11 '25 21:12 rbarcante

@eoris did the Claude install 2.0.62 but it's stills showing as 2.0.65, is there a catch?

It auto updates to latest version always. @eoris please let me also know how to prevent auto update.

Maharaj95 avatar Dec 11 '25 21:12 Maharaj95

@rbarcante @Maharaj95 try to add DISABLE_AUTOUPDATER=1 to .bashrc or .zshrc

eoris avatar Dec 12 '25 08:12 eoris

Same here. Hitting a 5 hour limit within 30 minutes without any change in my workflow.

grandtheftdisco avatar Dec 12 '25 13:12 grandtheftdisco

⭐ TIPS: I've been browsing similar area:cost issues re: Claude this morning and have gathered a few tips in one place. These don't fix the issue entirely but they've helped cut down my usage in the interim: hope these help someone!

  1. direct Claude to not use Task feature, and to use utility tools for common tasks. Add –no-agent flag to your prompt as well (several users reporting that Claude is recruiting subagents without disclosing this to the user)
  • ie “ --no-agent for the remainder of our work in this session. DO NOT use the Task tool. Use Grep to find files, Read to check them, and Edit to fix them directly."
  1. use /config in the CLI to disable auto-compact, as this is another source of token overconsumption

If anyone else finds ways to reduce token consumption while we wait, feel free to share! 👍

grandtheftdisco avatar Dec 12 '25 17:12 grandtheftdisco

⭐ TIPS: I've been browsing similar area:cost issues re: Claude this morning and have gathered a few tips in one place. These don't fix the issue entirely but they've helped cut down my usage in the interim: hope these help someone!

  1. direct Claude to not use Task feature, and to use utility tools for common tasks. Add –no-agent flag to your prompt as well (several users reporting that Claude is recruiting subagents without disclosing this to the user)
  • ie “ --no-agent for the remainder of our work in this session. DO NOT use the Task tool. Use Grep to find files, Read to check them, and Edit to fix them directly."
  1. use /config in the CLI to disable auto-compact, as this is another source of token overconsumption

If anyone else finds ways to reduce token consumption while we wait, feel free to share! 👍

Thank you so much. This worked for me to reduce it down significantly. I switched to sonnet as well just for extra safety net.

Maharaj95 avatar Dec 12 '25 22:12 Maharaj95

@Maharaj95 HOORAY! I'm glad it helped you! Hopefully they fix this soon 😖

grandtheftdisco avatar Dec 12 '25 23:12 grandtheftdisco

same here, any updates?

haanhtuan0000 avatar Dec 14 '25 23:12 haanhtuan0000

MCP tools can consume a bunch of the context.

Use /context to understand how your context is being used, and consider using the @ to toggle some MCP servers when you don't need them.

Also, autocompact doesn't actually consume context by spending tokens. It reserves some tokens so it has room at the end of the session. This doesn't cost you so much as reduce your available window.

I have disabled it to give myself more headroom.

For efficiency, consider also using haiku sub-agents for some tasks (like running commands) to reduce the main context usage and use a more cost-effective model.

jamestelfer avatar Dec 15 '25 02:12 jamestelfer

📌 Token consumption observations & mitigation strategies (Claude CLI)

Thanks for sharing these tips — they’ve been genuinely helpful. I wanted to add some additional observations and patterns from my own workflow that might help others dealing with high token usage.

1. Context snapshot pattern instead of full re-reads

In my case, I usually rely on a context snapshot pattern rather than asking Claude to re-read the entire codebase repeatedly. Instead of requesting full file reads, I ask Claude to:

  • Read status / summaries
  • Work from previously established context snapshots

This has been very effective in reducing unnecessary token usage.

Additionally, I maintain Development Standards documents and reference them explicitly via claude.md. This allows Claude to anchor decisions to stable documentation instead of re-deriving rules every time.


2. Sub-agents are a major hidden token sink

I’ve now fully switched to not using sub-agents, and I can confirm a noticeable improvement — thanks again for the tip.

From what I’ve observed, sub-agents are extremely expensive because:

  • Each sub-agent appears to re-instantiate the full claude.md context
  • This happens in addition to the global session context

This quickly multiplies token usage without being obvious to the user.


3. Project-scoped sessions may cause large upfront token spikes

One thing I’m still investigating:

I organize my work with separate sessions per project folder, and I usually copy all relevant documentation into each project directory.

I suspect this might be causing a large upfront token cost when starting a new Claude session — sometimes I see ~45k tokens consumed almost immediately.

If anyone has insights on:

  • Sharing context across sessions more efficiently
  • Or reducing initial context ingestion costs I’d really appreciate hearing about it.

4. Manual model switching (Haiku / Sonnet / Opus)

I’ve also started actively switching models depending on the task:

  • Haiku → very simple questions
  • Sonnet → analysis, planning, documentation
  • Opus → execution only

This helps, but it’s still manual, and I occasionally forget to switch models.

👉 Question to the community: Is there (or could there be) a way to automatically route requests to a model based on intent (analysis vs execution vs simple Q&A)?


5. Possible Sonnet behavior change

Lastly, this is subjective, but I’ve felt that Sonnet may have been nerfed recently, which indirectly pushes heavier usage toward Opus.

Hopefully there’s an upcoming update or clarification around this, because predictable cost/performance behavior is critical for real workflows.


Thanks again for sharing these findings. Hopefully this thread helps others reduce token burn while we wait for improvements.

dannydanzka avatar Dec 15 '25 02:12 dannydanzka

@dannydanzka thank you!

grandtheftdisco avatar Dec 15 '25 03:12 grandtheftdisco

📌 Token consumption observations & mitigation strategies (Claude CLI)

Thanks for sharing these tips — they’ve been genuinely helpful. I wanted to add some additional observations and patterns from my own workflow that might help others dealing with high token usage.

1. Context snapshot pattern instead of full re-reads

In my case, I usually rely on a context snapshot pattern rather than asking Claude to re-read the entire codebase repeatedly. Instead of requesting full file reads, I ask Claude to:

  • Read status / summaries
  • Work from previously established context snapshots

This has been very effective in reducing unnecessary token usage.

Additionally, I maintain Development Standards documents and reference them explicitly via claude.md. This allows Claude to anchor decisions to stable documentation instead of re-deriving rules every time.

2. Sub-agents are a major hidden token sink

I’ve now fully switched to not using sub-agents, and I can confirm a noticeable improvement — thanks again for the tip.

From what I’ve observed, sub-agents are extremely expensive because:

  • Each sub-agent appears to re-instantiate the full claude.md context
  • This happens in addition to the global session context

This quickly multiplies token usage without being obvious to the user.

3. Project-scoped sessions may cause large upfront token spikes

One thing I’m still investigating:

I organize my work with separate sessions per project folder, and I usually copy all relevant documentation into each project directory.

I suspect this might be causing a large upfront token cost when starting a new Claude session — sometimes I see ~45k tokens consumed almost immediately.

If anyone has insights on:

  • Sharing context across sessions more efficiently
  • Or reducing initial context ingestion costs I’d really appreciate hearing about it.

4. Manual model switching (Haiku / Sonnet / Opus)

I’ve also started actively switching models depending on the task:

  • Haiku → very simple questions
  • Sonnet → analysis, planning, documentation
  • Opus → execution only

This helps, but it’s still manual, and I occasionally forget to switch models.

👉 Question to the community: Is there (or could there be) a way to automatically route requests to a model based on intent (analysis vs execution vs simple Q&A)?

5. Possible Sonnet behavior change

Lastly, this is subjective, but I’ve felt that Sonnet may have been nerfed recently, which indirectly pushes heavier usage toward Opus.

Hopefully there’s an upcoming update or clarification around this, because predictable cost/performance behavior is critical for real workflows.

Thanks again for sharing these findings. Hopefully this thread helps others reduce token burn while we wait for improvements.

I'm not using any agents, always checking context, and always run clear before plan. And only couple mcp're active. And don't use them too much, 3-4 time during a week.

Yesterday I've asked just create some plan, based on some conditions, it's completed all at the same day and just asking save me plan to some file, it's completed and I've not reached any limit (Session, Week). And the next day I'm just saying yes, save this plan in some suggested .md file, and in just seconds it's eat 7% of the session. It's anormal.

sambua avatar Dec 15 '25 06:12 sambua

seem it was fixed starting 2.0.72 only it's not starts session with 7% but 4%, so it's progress 👍

sambua avatar Dec 18 '25 07:12 sambua

seem it was fixed starting 2.0.72 only it's not starts session with 7% but 4%, so it's progress 👍

Isn't that still too high? Even when I tried with thinking mode disabled the usage is high. But I'm getting much lower usage when it's with auto compact off. I think it's some bug with how auto compact is currently set up to work.

Maharaj95 avatar Dec 18 '25 14:12 Maharaj95

When Claude start eating a lot of tokens, I've used hybrid way by combining it with ChatGPT5.2 (now it's good in coding as well, but little bit slow), for planning and to consult Claude for coding ChatGPT 5.2 (Claude coding expensive), both 20$ monthly plan. Thanks Claude, for triggering me looking an alternative way 😄

sambua avatar Dec 20 '25 03:12 sambua

Additionally, I maintain Development Standards documents and reference them explicitly via claude.md.

Note that the CLAUDE.md file is read into context all the time, so while it does help grounding, it is a constant cost in tokens in your context (visible in /context). There can be multiple CLAUDE.md files too, including your user's CLAUDE.md. Repetition across these files will waste tokens too.

Consider keeping a lean context and allowing for progressive enhancement via skills: these can be scoped to your repository or your local machine, and can be pulled in automatically by Claude, or, if that isn't working well enough, you can make Claude more aware of the skill using explicit callouts in CLAUDE.md.

(Note that the model you're using will significantly affect how well your instructions will be followed, so use direct and careful language in your CLAUDE prompts to get value from the tokens.)

Skills can also use progressive disclosure to reduce upfront token usage, aside from their basic usage.

For example, a skill can have a reference subfolder that includes additional instructions in separate markdown files. Your SKILL.md can then act as an index of sorts, giving instructions on when to follow a given reference. These are not @ mentions, which are pulled in straight away, but instead relative references in text, like `references/something-about-mary.md`.

These can be used by sub-agents as well.

I have per-token billing with AWS Bedrock, so I will often use sub-agents for running commands that use the Haiku 4.5 model, as this significantly decreases the cost of reading and understanding command output. For this task too I've used a sub-agent paired with a script that redirects command output to a file for subsequent analysis with tools that read file segments.

Lastly, if you're interested in the base instructions used by Claude (that might contribute to your token usage alongside CLAUDE.md), consider looking at TweakCC as that may open the curtain a bit for you.

jamestelfer avatar Dec 20 '25 07:12 jamestelfer

@jamestelfer thank you! This is really helpful!

grandtheftdisco avatar Dec 20 '25 15:12 grandtheftdisco