[BUG] : Cannot use AWS Bedrock with Claude Code. Getting API Error (429 Too many tokens)
Environment Platform (select one):
Anthropic API
AWS Bedrock
Google Vertex AI
Other: Claude CLI version: 1.0.3 (Claude Code) Operating System: MacOS 15.5 Terminal: iTerm, Warp Bug Description When using claude cli with sonnet 4 bedrock model, it received API Error (429 Too many tokens, please wait before trying again.) message and never gets a response
Steps to Reproduce Run DISABLE_TELEMETRY=1 ANTHROPIC_SMALL_FAST_MODEL=us.anthropic.claude-3-5-haiku-20241022-v1:0 AWS_REGION=us-east-1 CLAUDE_CODE_USE_BEDROCK=1 DISABLE_PROMPT_CACHING=1 claude --model us.anthropic.claude-sonnet-4-20250514-v1:0
I was trialing Claude Code to see if I should pursue Claude Max subscription. I wanted to do this trial with AWS Bedrock but the experience has been so bad that I am looking to turn away from Claude Code altogether. Can someone please resolve this 429 too many tokens error pls? 429 Comes back
Expected Behavior Expect a response from the sonnet 4 model
Actual Behavior 429 too make tokens API error
Additional Context If I run the exact same command but use sonnet 3-7, no problems. I have access to the sonnet 4 in AWS in the regions I use in the CLI command. I have used that model with other tools without a problem.
I've been noticing this same problem too. It's so annoying that the Claude 4.0 models don't seems to be working with Bedrock. I ensured my user has claude 4.0 model access to all AWS regions in the cross-region reference. Setting my ~/.claude/settings.json to
{ "env": { "CLAUDE_CODE_USE_BEDROCK": "true", "ANTHROPIC_MODEL": "us.anthropic.claude-opus-4-20250514-v1:0", "DISABLE_PROMPT_CACHING": "false" } }
I get the API Error: 429 Too many tokens, please wait before trying again even for simple hello tests.
Hi everyone!
I am having the same issue when I am using cli and also the console even for a simple hello in a prompt and as the previous users mentioned it shows me the: too many tokens error
aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-east-1:xxxxxx:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
--body fileb://input.json \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--region us-east-1 \
out.txt
An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 2): Too many tokens, please wait before trying again.
I also tried creating an inference profile on aws for using claude 4 because it does not work on demand. (shows me:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-sonnet-4-20250514-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.
)
So now I have the following code:
import argparse
import json
import botocore.auth
import botocore.session
from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session
# --- Configuration ---
REGION = 'us-east-1'
inference_profile_id = 'us.anthropic.claude-sonnet-4-20250514-v1:0'
INFERENCE_PROFILE_ARN = f'arn:aws:bedrock:{REGION}:xxxxxx:application-inference-profile/{inference_profile_id}'
STYLE_PROMPT_PATH = "CONF_TEXT_ICONIP2025.txt"
ENDPOINT = f"https://bedrock-runtime.{REGION}.amazonaws.com/application-inference-profile/{inference_profile_id}/invoke"
# --- Build payload for Claude prompt-style API ---
def load_style_prompt(path):
with open(path, "r", encoding="utf-8") as f:
return f.read().strip()
def build_payload(style_prompt, user_message):
return {
"input": f"{style_prompt}\n\n{user_message}"
}
# --- Sign and send request ---
def invoke_claude(payload):
session = botocore.session.get_session()
credentials = session.get_credentials().get_frozen_credentials()
aws_request = AWSRequest(
method="POST",
url=ENDPOINT,
data=json.dumps(payload),
headers={
"content-type": "application/json",
"accept": "application/json"
}
)
sigv4 = botocore.auth.SigV4Auth(credentials, "bedrock", REGION)
sigv4.add_auth(aws_request)
prepared = aws_request.prepare()
print("\nSigned Headers Sent to AWS:")
for k, v in dict(prepared.headers).items():
print(f"{k}: {v}")
http_session = URLLib3Session()
response = http_session.send(prepared)
print("\nPayload Sent to AWS:")
print(prepared.body)
return response
# --- CLI Entrypoint ---
def main():
parser = argparse.ArgumentParser(description="Call Claude with a styled message.")
parser.add_argument("--message", required=True, help="The user message to send.")
args = parser.parse_args()
style_prompt = load_style_prompt(STYLE_PROMPT_PATH)
payload = build_payload(style_prompt, args.message)
response = invoke_claude(payload)
print(f"\nStatus Code: {response.status_code}")
try:
parsed = json.loads(response.text)
print(json.dumps(parsed, indent=2))
except Exception:
print("Could not parse response:")
print(response.text)
if __name__ == "__main__":
main()
and the output is:
Status Code: 200
{
"Output": {
"__type": "com.amazon.coral.service#UnknownOperationException"
},
"Version": "1.0"
}
I would be very grateful if someone has resolved this and has any idea on what to do next :)
me too only Claude 3.5 is working and if I view Claude 4 in the Service qoutas its this:
A workaround is setting "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024:
https://github.com/anthropics/claude-code/issues/1293#issuecomment-2938480588
A workaround is setting
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
Did it help you? Unfortunately, it didn’t work for me…
I am having issues too. I switched to from opus 4 to sonnet 4 and things started working again
I am having issues too. I switched to from opus 4 to sonnet 4 and things started working again
Having issues with sonnet 4 over here
Same issue happens when I make a get_image tool call through claude code subagents and run into 429 errors. Interestingly, the same issue does not happen when making mcp tool calls through the main thread.
I'm using Bedrock, Opus 4.1 with CLAUDE_CODE_MAX_OUTPUT_TOKENS=32000, and there was no problem recently, but it's happening again when the context gets bigger. There is no problem with Sonnet4.
This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.
I think I'm seeing it too. Should keep this ticket open.