Add Cerebras rate limit handler

Open kevint-cerebras opened this issue 4 months ago • 2 comments

Use exponential backoff to improve UX with Cerebras models.

Since the hourly and daily rate limits for tokens are the same, TPM is the limiter -> max retry wait time = 60 seconds so users can fully take advantage of using Cerebras

Aug 18 '25 17:08 kevint-cerebras

I've tested this PR with Cerebras and can confirm that the user experience is much improved - requests do get throttled, but you no longer get a timeout pretty much immediately.

Aug 20 '25 07:08 eloquence

Same, I tested on my fork and made using Cerebras doable.

Aug 22 '25 03:08 JC1738