Unable to use inputs larger than 2048 tokens even with models that support more
Describe the bug When making requests from an openai client and the setting max_tokens=4096, calls to /chat/completions fail.
Lambda logs show: [ERROR] 2024-08-18T22:33:13.008Z 89da0b17-063a-493f-baf5-0adcff29ecd1 Validation Error: An error occurred (ValidationException) when calling the ConverseStream operation: The maximum tokens you requested exceeds the model limit of 2048. Try again with a maximum tokens value that is lower than 2048.
Please complete the following information:
- [x] Which API you used: /chat/completions
- [x] Which model you used: meta.llama3-1-70b-instruct-v1:0
To Reproduce I am using the langchain openaichat wrapper but any client will result in the same. Create:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
max_tokens=4096,
base_url=XXX,
api_key=XXX,
model="meta.llama3-1-70b-instruct-v1:0",
)
Submit a request with a 3000 token payload.
Expected behavior Expect successful response as the model can handle 128k tokens.
Additional context N/A
also applicable to embeddings, I am using cohere.embed
[ERROR] Validation Error: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #/texts/41: expected maxLength: 2048, actual: 2062, please reformat your input and try again.
Has there been an update to this? I too am experiencing this when calling
Cluade 3.5 Sonnet v2
The maximum tokens you requested exceeds the model limit of 4096
Has there been an update to this? requested exceeds the model limit of 2048
I am able to use Claude 3.5 Sonnet v2 for more than 2048 max output tokens. LLama 3.1 restrict the max output to 2048 though.
{
"model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
"messages": [
{
"role": "user",
"content": "Write a story about 3000 words?"
}
],
"max_tokens": 4096,
"temperature": 1,
"stream": false,
"top_p": 0.5
}
I want to use AWS' latest Deepseek r1 for some complex mathematical reasoning, which may require 16k tokens