opencode Fix/cerebras conservative max tokens

This PR adds a specific configuration for the Cerebras provider to optimize rate limit handling and integration tracking.

Key changes:

Conservative Token Limit: Sets maxCompletionTokens to 16k. The Cerebras rate limiter estimates token consumption by reserving the full max_completion_tokens quota upfront. Using a conservative default prevents premature rate limiting, ensuring smoother operation even when actual generation is small.
Integration Header: Adds the X-Cerebras-3rd-Party-Integration: opencode` header.
Configuration: Sets autoload: false.

Testing: Verified functionality with the following models: gpt-oss-120b, qwen-235, zai-glm4.6

Dec 04 '25 01:12 sebastiand-cerebras

wouldn’t this kinda neuter a lot of models?

Can you explain why you need this models like gpt oss have 32k max completion output tokens and opencode should be respecting that…

What kinda plan are you on where you get ratelimited?

Dec 05 '25 16:12 rekram1-node

Cerebras handles rate limiting differently from most providers. It estimates token usage upfront using the max_completion_tokens value, so if a client always sends 32k, each request is counted as if it might produce 32k tokens, even when the actual completion is much smaller. On Cerebras Code plans this causes users to hit rate limits very quickly in agentic coding workflows that make many short calls, which is why a more conservative default like 8,192 tokens gives a much smoother experience without materially limiting typical code completions.

Dec 06 '25 00:12 sebastiand-cerebras