puter
puter copied to clipboard
Bug: 400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.
Issue Description
Hi
Some AI providers (including OpenAI) are deprecating the max_tokens parameter they use max_completion_tokensinstead, here is the error I get:
{
"success": false,
"error": {
"delegate": "openai-completion",
"message": "Error 400 from delegate `openai-completion`: 400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.",
"code": "error_400_from_delegate",
"$": "heyputer:api/APIError",
"status": 400
}
}
Query:
fetch("https://api.puter.com/drivers/call", {
"headers": {
"accept": "*/*",
"authorization": `Bearer ${TOKEN}`,
"content-type": "application/json;charset=UTF-8"
},
"body": "{\"interface\":\"puter-chat-completion\",\"driver\":\"openai-completion\",\"test_mode\":false,\"method\":\"complete\",\"args\":{\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\",\"editing\":false},{\"role\":\"user\",\"content\":\"What is life?\"}],\"model\":\"o3\",\"temperature\":0.7,\"max_tokens\":2000,\"stream\":true}}",
"method": "POST"
});
The weird thing is that it works with some models, I couldn't find the exact root cause (It could be when adding all the parameters for some type of models like o3), but that script is end up with error too frequently for some reason.
Steps to reproduce
Execute the previous script raise the previous error every time in my instance.
Expected behaviour
Output the response.
Addition Information or Screenshots (if applicable)
No response
Deployment
- [x] Production (puter.com)
- [ ] Development (
npm run start) - [ ] Docker (via
docker run) - [ ] Docker (via
docker-compose)
Puter version
No response
I was able to find a big of explanation on this on OpenAI forums.
It seems to be the case that OpenAI broke backwards compatibility because some clients were using max_tokens to limit the number of tokens in the response (i.e. that would be presented to the user) while other clients were using max_tokens to manage cost, which doesn't work anymore with reasoning models since they consume "reasoning tokens" (tokens that we don't see, but still pay for).
So the root cause is essentially: OpenAI had a lack for foresight that made it necessary for them to force a regression on us.
I wasn't aware of this as I rarely use the max_tokens parameter. Thanks for reporting this issue.
Note to future self: OpenAI docs refer to max_output_tokens instead of max_completion_tokens; it appears to be identical so that's likely what's preferred.
I want work on this project pls assign it to me.
@AbhiSharmaNIT much appreciated! I've assigned you. Let me know if you need any direction. You'll find documentation for the ai module in the wiki
I found this in src/backend/src/modules/GroqAIService.js could changing it to max_output_token resolve the problem ?
I found this in src/backend/src/modules/GroqAIService.js could changing it to max_output_token resolve the problem ?
I feel like this has to do with old vs new API spec, most of LLM's API are not stable, even if it's fixed it may become deprecated any time soon. I'm not sure though!
This has been fixed. I'm closing it for now, if this comes up again please let us know 🫡