claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] Prompt Caching Works on Claude 3.7 But Not On Claude 4 Family (AWS Bedrock)

Open IliaZenkov opened this issue 6 months ago • 12 comments

Environment

  • Platform (select one):
    • [ ] Anthropic API
    • [XX] AWS Bedrock
    • [ ] Google Vertex AI
    • [ ] Other:
  • Claude CLI version: 1.0.3
  • Operating System: macOS 24.4.0 (Darwin) on ARM64 architectur
  • Terminal: VS Code

Bug Description

Prompt caching works properly on Claude 3.7 Sonnet but does not work on newer Claude models (Claude Sonnet 4 and Claude Opus 4).

Steps to Reproduce

  1. Start a Claude Code session with default model (Claude 3.7 Sonnet)
  2. Note that cache writes are properly recorded
  3. Switch to Claude Sonnet 4 or Claude Opus 4 using model command
  4. Observe that cache read/write counts remain at 0

Expected Behavior

Claude 4 models should properly utilize prompt caching to reduce token usage and costs.

Actual Behavior

Only Claude 3.7 Sonnet shows cache activity (both reads and writes). Claude Sonnet 4 and Claude Opus 4 show 0 cache read/write regardless of

Additional Context

Image

Edit: Separately, the ANTHROPIC_SMALL_FAST_MODEL does not use prompt caching regardless of what model is set. This means that Claude 3.7 does not use prompt caching when ANTHROPIC_SMALL_FAST_MODEL is set to Claude 3.7. This is clearly a bug considering that Claude 3.7 prompt caching does work when it is set as the main model (ANTHROPIC_MODEL).

Verified with a clean install and completely clean configs.

Anthropic guys, I am begging you to fix this. It makes the Claude 4 family completely unusable for coding - the costs are astronomical compared to using Claude 3.7 with prompt caching - and worse - the latency is terrible once you have a bit of context, especially with Claude 4 Opus!

Thanks in advance - love the tool - life changing!

IliaZenkov avatar May 27 '25 02:05 IliaZenkov

I'm seeing the same with both Opus 4 and Sonnet 4, when selected with /model:

> Hi

⏺ Hi! How can I help you with the codebase today?

> Hi

⏺ Hello! What would you like to work on in the codebase?

> /cost
  ⎿  Total cost:            $1.03
     Total duration (API):  2m 51.0s
     Total duration (wall): 7m 1.8s
     Total code changes:    0 lines added, 0 lines removed
     Token usage by model:
         claude-3-5-haiku:  7.8k input, 385 output, 0 cache read, 0 cache write
            claude-sonnet:  338.1k input, 823 output, 0 cache read, 0 cache write

> /exit

Makes it very expensive to use - especially Opus 💸 💸 💸

I've switched back to Sonnet 3.7 for now. Hopefully this can get fixed soon! Appreciate all the work! 💪

srpouyet avatar May 27 '25 08:05 srpouyet

For anyone curious, this is NOT easy to fix in the installed code

/opt/homebrew/lib/node_modules/@anthropic-ai/claude-code/cli.js

I tried pretty hard. If anyone else wants to give it a shot, that's the file to do it in. I can see references to caching (e.g. "ephemeral") but I could NOT find why 3.7 is cached and 4 is not.

Good luck to anyone attempting this, let us know if you got it working!

IliaZenkov avatar May 27 '25 18:05 IliaZenkov

So Bedrock logs are actually showing that the cache is being set on some of the opus requests. But the logs also show that the output has 0 cache token hits. So I'm not even certain it's Claude Code messing it up anymore.

Edit: Below confirms that caching works with Cline and other tools - seems like a Claude Code issue after all

IliaZenkov avatar May 28 '25 06:05 IliaZenkov

It seems that aws bedrock does still not support prompt caching for claude 4. Still not listed in their documentation https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

LeanVel avatar May 28 '25 10:05 LeanVel

For sonnet v4 caching doesn't work. 3.7 works fine.

JakeDahl avatar May 28 '25 11:05 JakeDahl

well it seems it is working again -> {'us.anthropic.claude-sonnet-4-20250514-v1:0': {'input_token_details': {'cache_creation': 6704, 'cache_read': 6704}, 'input_tokens': 4458, 'total_tokens': 21525, 'output_tokens': 3659}}

LeanVel avatar May 28 '25 13:05 LeanVel

It still does not work for me sadly. Claude Sonnet 4 in Bedrock.

Other folks at my company using Cline with the same Claude Sonnet 4 model in Bedrock do have the cache working for them, so this problem isn't just in AWS. @LeanVel I assume AWS has just not yet updated their docs to say that caching is supported with Claude 4.

nadzinski avatar May 28 '25 17:05 nadzinski

@nadzinski Same here

@LeanVel I just tried with a clean install and I'm not getting any cache read tokens in Claude 4. Where is that output you copied coming from? Can you s

  1. Show that it works using a Claude Code request and then use the /cost command and paste the text or the screenshot?
  2. Show output of cat ~/.claude/settings.json
  3. Show output of claude config get --global env

It'll probably be another few days before AWS or Anthropic responds so I would be SO happy if you helped us all out here! ❤

IliaZenkov avatar May 28 '25 20:05 IliaZenkov

Same experience here with sonnet 4 on bedrock. Thanks for reporting.

bigcodegen avatar May 28 '25 20:05 bigcodegen

Same here. I have created an AWS Support case at the request of our TAM. Will post here if I get any info not already covered above.

drobbins-ancile avatar May 28 '25 21:05 drobbins-ancile

Same here. I have created an AWS Support case at the request of our TAM. Will post here if I get any info not already covered above.

@drobbins-ancile Thanks, doing good work there. Based on @nadzinski saying caching works w/ Cline (and lack of complaints from other coding tools) - looks like this is indeed a Claude Code issue.

IliaZenkov avatar May 28 '25 21:05 IliaZenkov

It seems that aws bedrock does still not support prompt caching for claude 4.

https://www.ai.moda/amazon-bedrock-models-worker/foundation-models

Looking at this (it's pulling data directly from an AWS API, as AWS docs are often outdated), Bedrock is claiming for anthropic.claude-sonnet-4-20250514-v1:0 and anthropic.claude-opus-4-20250514-v1:0.

Manouchehri avatar May 29 '25 13:05 Manouchehri

@IliaZenkov how are you setting up your env or ~/.claude/settings.json ? I can't even get sonnet 4 to run without an 429 too many tokens error from bedrock.

marklaczynski avatar May 29 '25 15:05 marklaczynski

@marklaczynski same as you if you're seeing 429's. that's exactly what I'm seeing. both for sonnet 4 and opus 4. and I don't think I'm hitting my limits either. You can check the AWS logs in cloudwatch, they have a dashboard by default for bedrock, input/output tokens by model to confirm.

in fact my ~/.claude/settings.json is completely empty. I just set env vars

export CLAUDE_CODE_USE_BEDROCK=1
export DISABLE_PROMPT_CACHING=0
export ANTHROPIC_MODEL='us.anthropic.claude-sonnet-4-20250514-v1:0'

IliaZenkov avatar May 29 '25 18:05 IliaZenkov

Getting some cache read and hits on Opus 4 now. Still getting throttled but... looks like AWS is cooking ;) Sorry to the Claude Code maintainers! We just REALLY love this tool!

IliaZenkov avatar May 29 '25 18:05 IliaZenkov

Similar to @IliaZenkov, I'm now seeing cache use with Claude Sonnet 4. 🎉 (Still getting frequent 429s like other folks but the retries usually succeed. Hopefully AWS will add more capacity soon and these will go away...)

It's a mystery to me why caching was working with other tools before but not Claude. Perhaps something about Claude's pattern of api calls exposed a bug which AWS has now fixed? Regardless, this does seem to have been an AWS issue all along.

Thanks again to the Claude maintainers for making such an awesome AI agent! It's great to be able to use it with these new models. ❤

nadzinski avatar May 29 '25 19:05 nadzinski

Marking this one as closed, as this doesn't look like an issue with Claude Code. Thanks all for the discussion!

@IliaZenkov re ANTHROPIC_SMALL_FAST_MODEL, this is expected. This model is typically used in scenarios where we do not expect the prefix to repeat in future queries, or the repeated portion is very short. If you're seeing excessive non-cached usage, feel free to open another issue.

ant-kurt avatar May 29 '25 20:05 ant-kurt

this just updated their docs so maybe it is stable now -> https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

for me it worked sometimes and some others not. Im using langchains-aws to interface with bedrock to use claude 4

LeanVel avatar May 30 '25 12:05 LeanVel

This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.

github-actions[bot] avatar Aug 09 '25 14:08 github-actions[bot]