puter icon indicating copy to clipboard operation
puter copied to clipboard

Bug: 400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

Open bitsnaps opened this issue 7 months ago • 4 comments

Issue Description

Hi

Some AI providers (including OpenAI) are deprecating the max_tokens parameter they use max_completion_tokensinstead, here is the error I get:

{
    "success": false,
    "error": {
        "delegate": "openai-completion",
        "message": "Error 400 from delegate `openai-completion`: 400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.",
        "code": "error_400_from_delegate",
        "$": "heyputer:api/APIError",
        "status": 400
    }
}

Query:

fetch("https://api.puter.com/drivers/call", {
  "headers": {
    "accept": "*/*",
    "authorization": `Bearer ${TOKEN}`,
    "content-type": "application/json;charset=UTF-8"
  },
  "body": "{\"interface\":\"puter-chat-completion\",\"driver\":\"openai-completion\",\"test_mode\":false,\"method\":\"complete\",\"args\":{\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\",\"editing\":false},{\"role\":\"user\",\"content\":\"What is life?\"}],\"model\":\"o3\",\"temperature\":0.7,\"max_tokens\":2000,\"stream\":true}}",
  "method": "POST"
});

The weird thing is that it works with some models, I couldn't find the exact root cause (It could be when adding all the parameters for some type of models like o3), but that script is end up with error too frequently for some reason.

Steps to reproduce

Execute the previous script raise the previous error every time in my instance.

Expected behaviour

Output the response.

Addition Information or Screenshots (if applicable)

No response

Deployment

  • [x] Production (puter.com)
  • [ ] Development (npm run start)
  • [ ] Docker (via docker run)
  • [ ] Docker (via docker-compose)

Puter version

No response

bitsnaps avatar Apr 23 '25 21:04 bitsnaps

I was able to find a big of explanation on this on OpenAI forums.

It seems to be the case that OpenAI broke backwards compatibility because some clients were using max_tokens to limit the number of tokens in the response (i.e. that would be presented to the user) while other clients were using max_tokens to manage cost, which doesn't work anymore with reasoning models since they consume "reasoning tokens" (tokens that we don't see, but still pay for).

So the root cause is essentially: OpenAI had a lack for foresight that made it necessary for them to force a regression on us.

I wasn't aware of this as I rarely use the max_tokens parameter. Thanks for reporting this issue.

KernelDeimos avatar Apr 25 '25 20:04 KernelDeimos

Note to future self: OpenAI docs refer to max_output_tokens instead of max_completion_tokens; it appears to be identical so that's likely what's preferred.

KernelDeimos avatar Apr 26 '25 03:04 KernelDeimos

I want work on this project pls assign it to me.

AbhiSharmaNIT avatar Apr 28 '25 20:04 AbhiSharmaNIT

@AbhiSharmaNIT much appreciated! I've assigned you. Let me know if you need any direction. You'll find documentation for the ai module in the wiki

KernelDeimos avatar Apr 30 '25 01:04 KernelDeimos

I found this in src/backend/src/modules/GroqAIService.js could changing it to max_output_token resolve the problem ?

Image

rowin-C avatar Jun 13 '25 05:06 rowin-C

I found this in src/backend/src/modules/GroqAIService.js could changing it to max_output_token resolve the problem ?

I feel like this has to do with old vs new API spec, most of LLM's API are not stable, even if it's fixed it may become deprecated any time soon. I'm not sure though!

bitsnaps avatar Jun 14 '25 14:06 bitsnaps

This has been fixed. I'm closing it for now, if this comes up again please let us know 🫡

jelveh avatar Jul 02 '25 19:07 jelveh