bedrock-access-gateway Unable to use inputs larger than 2048 tokens even with models that support more

Describe the bug When making requests from an openai client and the setting max_tokens=4096, calls to /chat/completions fail.

Lambda logs show: [ERROR] 2024-08-18T22:33:13.008Z 89da0b17-063a-493f-baf5-0adcff29ecd1 Validation Error: An error occurred (ValidationException) when calling the ConverseStream operation: The maximum tokens you requested exceeds the model limit of 2048. Try again with a maximum tokens value that is lower than 2048.

Please complete the following information:

[x] Which API you used: /chat/completions
[x] Which model you used: meta.llama3-1-70b-instruct-v1:0

To Reproduce I am using the langchain openaichat wrapper but any client will result in the same. Create:

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
  max_tokens=4096,
  base_url=XXX,
  api_key=XXX,
  model="meta.llama3-1-70b-instruct-v1:0",
  
)

Submit a request with a 3000 token payload.

Expected behavior Expect successful response as the model can handle 128k tokens.

Additional context N/A

Aug 18 '24 22:08 arnaudaz

also applicable to embeddings, I am using cohere.embed

[ERROR] Validation Error: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #/texts/41: expected maxLength: 2048, actual: 2062, please reformat your input and try again.

Aug 22 '24 02:08 severity1

Has there been an update to this? I too am experiencing this when calling

Cluade 3.5 Sonnet v2

The maximum tokens you requested exceeds the model limit of 4096

Jan 22 '25 15:01 toninog

Has there been an update to this? requested exceeds the model limit of 2048

Mar 19 '25 06:03 jurchens

I am able to use Claude 3.5 Sonnet v2 for more than 2048 max output tokens. LLama 3.1 restrict the max output to 2048 though.

{
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "messages": [
        {
            "role": "user",
            "content": "Write a story about 3000 words?"
        }
    ],
    "max_tokens": 4096,
    "temperature": 1,
    "stream": false,
    "top_p": 0.5
}

Mar 19 '25 07:03 daixba

I want to use AWS' latest Deepseek r1 for some complex mathematical reasoning, which may require 16k tokens

Mar 19 '25 07:03 jurchens