boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Bedrock Claude 3.5 V1 inferenceConfig maxTokens / max_tokens ignore 8192 even when set, is always 4096

Open ory-name-tbd opened this issue 8 months ago • 1 comments

Describe the bug

Bedrock Claude 3.5 V1 calls ignore max_tokens, always set to 4096 even if requested to be 8192.

{
    "schemaType": "ModelInvocationLog",
    "schemaVersion": "1.0",
    "timestamp": "2025-05-06T08:44:53Z",
    "accountId": "redacted",
    "identity": {
        "arn": "redacted"
    },
    "region": "eu-central-1",
    "requestId": "1d081c6a-9b89-422b-99de-c0056b8008bc",
    "operation": "Converse",
    "modelId": "arn:aws:bedrock:eu-central-1:redacted:inference-profile/eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "input": {
        "inputContentType": "application/json",
        "inputBodyJson": {
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "text": "redacted"
                        }
                    ]
                }
            ],
            "system": [
                {
                    "text": "redacted"
                }
            ],
            "inferenceConfig": {
                "maxTokens": 8192, // <---- THIS
                "temperature": 0.7
            },
            "toolConfig": {
                "tools": ["redacted"],
                "toolChoice": {
                    "tool": {
                        "name": "redacted"
                    }
                }
            },
            "additionalModelRequestFields": {}
        },
        "inputTokenCount": 2085
    },
    "output": {
        "outputContentType": "application/json",
        "outputBodyJson": {
            "output": {
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "toolUse": {
                                "toolUseId": "tooluse_KSDC5-7STHupgDNlmsJdIQ",
                                "name": "redacted",
                                "input": {
                                    "status": "success",
                                    "command": "update",
                                    "collection": "redacted",
                                    "item_id": "redacted"
                                }
                            }
                        }
                    ]
                }
            },
            "stopReason": "max_tokens",
            "metrics": {
                "latencyMs": 87074
            },
            "usage": {
                "inputTokens": 2085,
                "outputTokens": 4096, // <---- THIS
                "totalTokens": 6181
            }
        },
        "outputTokenCount": 4096  // <---- THIS
    },
    "inferenceRegion": "eu-central-1"
}

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Expected Behavior

maxTokens should be adhered to and the stop reason should never occur, specifically not on max_tokens.

Current Behavior

Requests with tokens over >~4k should not be stopped.

Reproduction Steps

  • Use Claude 3.5 v1 model and generate a prompt with response over 4096
  • set max_tokens to 8192

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.38.3

Environment details (OS name and version, etc.)

Ubuntu

ory-name-tbd avatar May 06 '25 09:05 ory-name-tbd

Hello @ory-name-tbd, thanks for reaching out. I have replicated the issue and got the same behavior where outputTokens does not follow the maxTokens. Since this is a Bedrock issue, I have reached out to the Bedrock service team in this regards. I will update as soon as there are any updates. Please let me know if you have further questions. Thank you.

For Internal Tracking: P247132763

adev-code avatar Jun 02 '25 22:06 adev-code

Thank you @adev-code for raising this to the Bedrock team. Any update from them as of today? Is the issue planned to be addressed? Or is there a better place to follow on this problem.

Would be nice to have some info as the problem exists for quite some time now. See https://github.com/boto/boto3/issues/4279#issuecomment-2361634590

superjulius avatar Jul 18 '25 17:07 superjulius

Hello thanks for the patience and an update has been given from the team. The Bedrock team has mentioned that Sonnet 3.5v1 has a max output token limit of 4096.

adev-code avatar Sep 17 '25 16:09 adev-code

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Sep 17 '25 16:09 github-actions[bot]