claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] : Cannot use AWS Bedrock with Claude Code. Getting API Error (429 Too many tokens)

Open neo-picasso-2112 opened this issue 6 months ago • 11 comments

Environment Platform (select one):

Anthropic API

AWS Bedrock

Google Vertex AI

Other: Claude CLI version: 1.0.3 (Claude Code) Operating System: MacOS 15.5 Terminal: iTerm, Warp Bug Description When using claude cli with sonnet 4 bedrock model, it received API Error (429 Too many tokens, please wait before trying again.) message and never gets a response

Steps to Reproduce Run DISABLE_TELEMETRY=1 ANTHROPIC_SMALL_FAST_MODEL=us.anthropic.claude-3-5-haiku-20241022-v1:0 AWS_REGION=us-east-1 CLAUDE_CODE_USE_BEDROCK=1 DISABLE_PROMPT_CACHING=1 claude --model us.anthropic.claude-sonnet-4-20250514-v1:0

I was trialing Claude Code to see if I should pursue Claude Max subscription. I wanted to do this trial with AWS Bedrock but the experience has been so bad that I am looking to turn away from Claude Code altogether. Can someone please resolve this 429 too many tokens error pls? 429 Comes back

Expected Behavior Expect a response from the sonnet 4 model

Actual Behavior 429 too make tokens API error

Additional Context If I run the exact same command but use sonnet 3-7, no problems. I have access to the sonnet 4 in AWS in the regions I use in the CLI command. I have used that model with other tools without a problem.

neo-picasso-2112 avatar May 31 '25 18:05 neo-picasso-2112

I've been noticing this same problem too. It's so annoying that the Claude 4.0 models don't seems to be working with Bedrock. I ensured my user has claude 4.0 model access to all AWS regions in the cross-region reference. Setting my ~/.claude/settings.json to

{ "env": { "CLAUDE_CODE_USE_BEDROCK": "true", "ANTHROPIC_MODEL": "us.anthropic.claude-opus-4-20250514-v1:0", "DISABLE_PROMPT_CACHING": "false" } }

I get the API Error: 429 Too many tokens, please wait before trying again even for simple hello tests.

Clabes avatar Jun 02 '25 17:06 Clabes

Hi everyone!

I am having the same issue when I am using cli and also the console even for a simple hello in a prompt and as the previous users mentioned it shows me the: too many tokens error

aws bedrock-runtime invoke-model \
  --model-id arn:aws:bedrock:us-east-1:xxxxxx:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
  --body fileb://input.json \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --region us-east-1 \
  out.txt

An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 2): Too many tokens, please wait before trying again.

I also tried creating an inference profile on aws for using claude 4 because it does not work on demand. (shows me:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-sonnet-4-20250514-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.

)

So now I have the following code:

import argparse
import json
import botocore.auth
import botocore.session
from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session

# --- Configuration ---
REGION = 'us-east-1'
inference_profile_id = 'us.anthropic.claude-sonnet-4-20250514-v1:0'
INFERENCE_PROFILE_ARN = f'arn:aws:bedrock:{REGION}:xxxxxx:application-inference-profile/{inference_profile_id}'
STYLE_PROMPT_PATH = "CONF_TEXT_ICONIP2025.txt"
ENDPOINT = f"https://bedrock-runtime.{REGION}.amazonaws.com/application-inference-profile/{inference_profile_id}/invoke"

# --- Build payload for Claude prompt-style API ---
def load_style_prompt(path):
    with open(path, "r", encoding="utf-8") as f:
        return f.read().strip()

def build_payload(style_prompt, user_message):
    return {
        "input": f"{style_prompt}\n\n{user_message}"
    }


# --- Sign and send request ---
def invoke_claude(payload):
    session = botocore.session.get_session()
    credentials = session.get_credentials().get_frozen_credentials()

    aws_request = AWSRequest(
        method="POST",
        url=ENDPOINT,
        data=json.dumps(payload),
        headers={
            "content-type": "application/json",
            "accept": "application/json"
        }
    )

    sigv4 = botocore.auth.SigV4Auth(credentials, "bedrock", REGION)
    sigv4.add_auth(aws_request)

    prepared = aws_request.prepare()
    print("\nSigned Headers Sent to AWS:")
    for k, v in dict(prepared.headers).items():
        print(f"{k}: {v}")

    http_session = URLLib3Session()
    response = http_session.send(prepared)
    print("\nPayload Sent to AWS:")
    print(prepared.body)

    return response

# --- CLI Entrypoint ---
def main():
    parser = argparse.ArgumentParser(description="Call Claude with a styled message.")
    parser.add_argument("--message", required=True, help="The user message to send.")
    args = parser.parse_args()

    style_prompt = load_style_prompt(STYLE_PROMPT_PATH)
    payload = build_payload(style_prompt, args.message)
    response = invoke_claude(payload)

    print(f"\nStatus Code: {response.status_code}")
    try:
        parsed = json.loads(response.text)
        print(json.dumps(parsed, indent=2))
    except Exception:
        print("Could not parse response:")
        print(response.text)

if __name__ == "__main__":
    main()

and the output is:

Status Code: 200
{
  "Output": {
    "__type": "com.amazon.coral.service#UnknownOperationException"
  },
  "Version": "1.0"
}

I would be very grateful if someone has resolved this and has any idea on what to do next :)

joannakarayianni avatar Jun 03 '25 10:06 joannakarayianni

me too only Claude 3.5 is working and if I view Claude 4 in the Service qoutas its this:

Image

Image

itsjustmeemman avatar Jun 04 '25 04:06 itsjustmeemman

A workaround is setting "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024: https://github.com/anthropics/claude-code/issues/1293#issuecomment-2938480588

xerial avatar Jun 05 '25 04:06 xerial

A workaround is setting "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024

Did it help you? Unfortunately, it didn’t work for me…

Smotrov avatar Jun 12 '25 15:06 Smotrov

I am having issues too. I switched to from opus 4 to sonnet 4 and things started working again

PhillipNinan avatar Jun 17 '25 20:06 PhillipNinan

I am having issues too. I switched to from opus 4 to sonnet 4 and things started working again

Having issues with sonnet 4 over here

moro-no-kimi avatar Jul 23 '25 02:07 moro-no-kimi

Same issue happens when I make a get_image tool call through claude code subagents and run into 429 errors. Interestingly, the same issue does not happen when making mcp tool calls through the main thread.

ldave23 avatar Aug 26 '25 18:08 ldave23

I'm using Bedrock, Opus 4.1 with CLAUDE_CODE_MAX_OUTPUT_TOKENS=32000, and there was no problem recently, but it's happening again when the context gets bigger. There is no problem with Sonnet4.

angrycoder avatar Aug 27 '25 00:08 angrycoder

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

github-actions[bot] avatar Dec 04 '25 10:12 github-actions[bot]

I think I'm seeing it too. Should keep this ticket open.

bxm156 avatar Dec 11 '25 19:12 bxm156