claude-code
claude-code copied to clipboard
[BUG] Prompt Caching Works on Claude 3.7 But Not On Claude 4 Family (AWS Bedrock)
Environment
- Platform (select one):
- [ ] Anthropic API
- [XX] AWS Bedrock
- [ ] Google Vertex AI
- [ ] Other:
- Claude CLI version: 1.0.3
- Operating System: macOS 24.4.0 (Darwin) on ARM64 architectur
- Terminal: VS Code
Bug Description
Prompt caching works properly on Claude 3.7 Sonnet but does not work on newer Claude models (Claude Sonnet 4 and Claude Opus 4).
Steps to Reproduce
- Start a Claude Code session with default model (Claude 3.7 Sonnet)
- Note that cache writes are properly recorded
- Switch to Claude Sonnet 4 or Claude Opus 4 using model command
- Observe that cache read/write counts remain at 0
Expected Behavior
Claude 4 models should properly utilize prompt caching to reduce token usage and costs.
Actual Behavior
Only Claude 3.7 Sonnet shows cache activity (both reads and writes). Claude Sonnet 4 and Claude Opus 4 show 0 cache read/write regardless of
Additional Context
Edit: Separately, the ANTHROPIC_SMALL_FAST_MODEL does not use prompt caching regardless of what model is set. This means that Claude 3.7 does not use prompt caching when ANTHROPIC_SMALL_FAST_MODEL is set to Claude 3.7. This is clearly a bug considering that Claude 3.7 prompt caching does work when it is set as the main model (ANTHROPIC_MODEL).
Verified with a clean install and completely clean configs.
Anthropic guys, I am begging you to fix this. It makes the Claude 4 family completely unusable for coding - the costs are astronomical compared to using Claude 3.7 with prompt caching - and worse - the latency is terrible once you have a bit of context, especially with Claude 4 Opus!
Thanks in advance - love the tool - life changing!
I'm seeing the same with both Opus 4 and Sonnet 4, when selected with /model:
> Hi
⏺ Hi! How can I help you with the codebase today?
> Hi
⏺ Hello! What would you like to work on in the codebase?
> /cost
⎿ Total cost: $1.03
Total duration (API): 2m 51.0s
Total duration (wall): 7m 1.8s
Total code changes: 0 lines added, 0 lines removed
Token usage by model:
claude-3-5-haiku: 7.8k input, 385 output, 0 cache read, 0 cache write
claude-sonnet: 338.1k input, 823 output, 0 cache read, 0 cache write
> /exit
Makes it very expensive to use - especially Opus 💸 💸 💸
I've switched back to Sonnet 3.7 for now. Hopefully this can get fixed soon! Appreciate all the work! 💪
For anyone curious, this is NOT easy to fix in the installed code
/opt/homebrew/lib/node_modules/@anthropic-ai/claude-code/cli.js
I tried pretty hard. If anyone else wants to give it a shot, that's the file to do it in. I can see references to caching (e.g. "ephemeral") but I could NOT find why 3.7 is cached and 4 is not.
Good luck to anyone attempting this, let us know if you got it working!
So Bedrock logs are actually showing that the cache is being set on some of the opus requests. But the logs also show that the output has 0 cache token hits. So I'm not even certain it's Claude Code messing it up anymore.
Edit: Below confirms that caching works with Cline and other tools - seems like a Claude Code issue after all
It seems that aws bedrock does still not support prompt caching for claude 4. Still not listed in their documentation https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
For sonnet v4 caching doesn't work. 3.7 works fine.
well it seems it is working again -> {'us.anthropic.claude-sonnet-4-20250514-v1:0': {'input_token_details': {'cache_creation': 6704, 'cache_read': 6704}, 'input_tokens': 4458, 'total_tokens': 21525, 'output_tokens': 3659}}
It still does not work for me sadly. Claude Sonnet 4 in Bedrock.
Other folks at my company using Cline with the same Claude Sonnet 4 model in Bedrock do have the cache working for them, so this problem isn't just in AWS. @LeanVel I assume AWS has just not yet updated their docs to say that caching is supported with Claude 4.
@nadzinski Same here
@LeanVel I just tried with a clean install and I'm not getting any cache read tokens in Claude 4. Where is that output you copied coming from? Can you s
- Show that it works using a Claude Code request and then use the
/costcommand and paste the text or the screenshot? - Show output of
cat ~/.claude/settings.json - Show output of
claude config get --global env
It'll probably be another few days before AWS or Anthropic responds so I would be SO happy if you helped us all out here! ❤
Same experience here with sonnet 4 on bedrock. Thanks for reporting.
Same here. I have created an AWS Support case at the request of our TAM. Will post here if I get any info not already covered above.
Same here. I have created an AWS Support case at the request of our TAM. Will post here if I get any info not already covered above.
@drobbins-ancile Thanks, doing good work there. Based on @nadzinski saying caching works w/ Cline (and lack of complaints from other coding tools) - looks like this is indeed a Claude Code issue.
It seems that aws bedrock does still not support prompt caching for claude 4.
https://www.ai.moda/amazon-bedrock-models-worker/foundation-models
Looking at this (it's pulling data directly from an AWS API, as AWS docs are often outdated), Bedrock is claiming for anthropic.claude-sonnet-4-20250514-v1:0 and anthropic.claude-opus-4-20250514-v1:0.
@IliaZenkov how are you setting up your env or ~/.claude/settings.json ? I can't even get sonnet 4 to run without an 429 too many tokens error from bedrock.
@marklaczynski same as you if you're seeing 429's. that's exactly what I'm seeing. both for sonnet 4 and opus 4. and I don't think I'm hitting my limits either. You can check the AWS logs in cloudwatch, they have a dashboard by default for bedrock, input/output tokens by model to confirm.
in fact my ~/.claude/settings.json is completely empty. I just set env vars
export CLAUDE_CODE_USE_BEDROCK=1
export DISABLE_PROMPT_CACHING=0
export ANTHROPIC_MODEL='us.anthropic.claude-sonnet-4-20250514-v1:0'
Getting some cache read and hits on Opus 4 now. Still getting throttled but... looks like AWS is cooking ;) Sorry to the Claude Code maintainers! We just REALLY love this tool!
Similar to @IliaZenkov, I'm now seeing cache use with Claude Sonnet 4. 🎉 (Still getting frequent 429s like other folks but the retries usually succeed. Hopefully AWS will add more capacity soon and these will go away...)
It's a mystery to me why caching was working with other tools before but not Claude. Perhaps something about Claude's pattern of api calls exposed a bug which AWS has now fixed? Regardless, this does seem to have been an AWS issue all along.
Thanks again to the Claude maintainers for making such an awesome AI agent! It's great to be able to use it with these new models. ❤
Marking this one as closed, as this doesn't look like an issue with Claude Code. Thanks all for the discussion!
@IliaZenkov re ANTHROPIC_SMALL_FAST_MODEL, this is expected. This model is typically used in scenarios where we do not expect the prefix to repeat in future queries, or the repeated portion is very short. If you're seeing excessive non-cached usage, feel free to open another issue.
this just updated their docs so maybe it is stable now -> https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
for me it worked sometimes and some others not. Im using langchains-aws to interface with bedrock to use claude 4
This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.