opencode icon indicating copy to clipboard operation
opencode copied to clipboard

max_tokens defaults to 32000 when using a custom provider

Open nmartorell opened this issue 5 months ago • 4 comments

Hi,

I'm using LLM (Anthropic, OpenAI and Bedrock) models through an OpenAI-compliant LLM API Gateway. I configured a custom provider in opencode to use this LLM Gateway using the opencode.json file, e.g.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "myprovider": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Custom LLM Gateway",
      "options": {
        "baseURL": "<GATEWAY_URL",
        "apiKey": "<API_KEY>",
      },
      "models": {
        "openai:gpt-4o-mini": {
          "name": "gpt-4o-mini"
        }
        "anthropic:claude-3-5-haiku-20241022": {
          "name": "claude-3-5-haiku-20241022"
        }
      }
    }
  }
}

When I try to use either of these models through opencode, I'm receiving error messages that suggest that something (maybe opencode, maybe the Vercel AI SDK) is defaulting the max_tokens OpenAI Completions field to 32000, which unfortunately is far too large for these models.

As an example, here is the error I see in the LLM Gateway logs when attempting to use the gpt-4o model (I see a similar error with Anthropic models):

  "error": {
    "message": "max_tokens is too large: 32000. This model supports at most 16384 completion tokens, whereas you provided 32000.",
    "type": "invalid_request_error",
    "param": "max_tokens",
    "code": "invalid_value"
  }
}

I've tried searching the opencode and Vercel AI SDK codebases to try to find where this max_tokens value is being set, but I unfortunately can't find it. Assuming it is being set by opencode, it would be ideal if this value were left unset when using custom providers, as in the case of my LLM Gateway, it already keeps track of the max_token parameter for each provider / model.

A couple other notes:

  • The reason I know the max_tokens parameter is being set by opencode or Vercel SDK, is because when I write my own http request to my LLM Gateway without specifying max_tokens, the query goes through, e.g.
curl --header 'Authorization: Bearer <API TOKEN>' -H "Content-Type: application/json" -X POST --data '{"model":"openai:gpt-4o-mini","messages":[ {"role": "user", "content": "Hello!"}]}' <GATEWAY_URL>/v1/chat/completions
  • When I modify the opencode.json file to include context and max output tokens, opencode works as expected (I know this is a workaround, but really trying not to have to hardcode context and output tokens in both places):
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "myprovider": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Custom LLM Gateway",
      "options": {
        "baseURL": "<GATEWAY_URL",
        "apiKey": "<API_KEY>",
      },
      "models": {
        "openai:gpt-4o-mini": {
          "name": "gpt-4o-mini",
          "limit": {
            "context": 10000,
            "output": 5000
          }
        }
        "anthropic:claude-3-5-haiku-20241022": {
          "name": "claude-3-5-haiku-20241022",
          "limit": {
            "context": 10000,
            "output": 5000
          }
        }
      }
    }
  }
}

Please let me know if my understanding is wrong, and this parameter is being set elsewhere. Also, please let me know if there is any additional information required to troubleshoot.

Thank you, and thanks for making such an awesome tool!

nmartorell avatar Aug 08 '25 22:08 nmartorell

opencode needs to know what the max output length is - when it doesn't it uses 32_000

i can't think of a way around this - you need to specify it somewhere

thdxr avatar Aug 09 '25 00:08 thdxr

try this please. Worked for me with kimi k2 free.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "openrouter": {
      "models": {
        "moonshotai/kimi-k2:free": {
          "options": {
            "provider": {
              "order": ["baseten", "together", "openrouter"],
              "allow_fallbacks": true
            },
            "transform": "middle-out",
            "max_tokens": 6000
          }
        }
      }
    }
  }
}

patrickwork28 avatar Sep 07 '25 12:09 patrickwork28

I'm having issues with this as well :( I've tried throwing the max_token setting everywhere I could think, but it still shows up when I look at the logs in LM studio as

25-12-04 17:11:17 [DEBUG]
 Received request: POST to /v1/chat/completions with body  {
  "model": "qwen/qwen3-coder-30b",
  "max_tokens": 32000,
  "temperature": 0.55,

The config

{
    "$schema": "https://opencode.ai/config.json",
    "permission": {
      "edit": "allow",
      "bash": "allow",
      "webfetch": "ask"
    },
    "provider": {
      "lmstudio": {
        "npm": "@ai-sdk/openai-compatible",
        "name": "Mac Mini",
        "options": {
          "baseURL": "http://addres:8123/v1",
          "timeout": 100000000000,
	  "max_tokens": 200000,
          "max_completion_tokens": 200000
        },
        "models": {
          "qwen/qwen3-coder-30b": {
            "name": "Qwen Coder (local)",
            "max_tokens": 131072,
            "max_completion_tokens": 200000,
            "limit": {
              "context": 209600,
              "output": 50000
            }
          }
        }
      }
    }
}

njbrake avatar Dec 04 '25 22:12 njbrake