async-openai icon indicating copy to clipboard operation
async-openai copied to clipboard

Handle rate limiting with leaky bucket instead of only backoff

Open oyarsa opened this issue 10 months ago • 0 comments

If I understand the code correctly, the current mechanism used for dealing with rate limits relies on exponential backoff to try again until it succeeds or it runs out of time.

This works for a small number of requests, but when running lots of them all at once (I'm talking about thousands in a batch), this doesn't work very well. What I found to be useful is using a leaky bucket–based rate limiter for both requests per minute and tokens per minute, which handles this better. The Python package openlimit implements this.

oyarsa avatar Jan 15 '25 12:01 oyarsa