opencode
opencode copied to clipboard
Add Cerebras rate limit handler
Use exponential backoff to improve UX with Cerebras models.
Since the hourly and daily rate limits for tokens are the same, TPM is the limiter -> max retry wait time = 60 seconds so users can fully take advantage of using Cerebras
I've tested this PR with Cerebras and can confirm that the user experience is much improved - requests do get throttled, but you no longer get a timeout pretty much immediately.
Same, I tested on my fork and made using Cerebras doable.