crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

Token Per Minute (TPM) Limiter

Open bhancockio opened this issue 10 months ago • 5 comments

It would be awesome if crewAI had token per minute property that we could set when defining the crew, so that we don't get rate limited by services such as Groq.

Here's an example rate limit from Groq that I frequently get inside of my crews:

groq.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_123` on tokens per minute (TPM): Limit 5000, Used 5747, Requested ~4251. Please try again in 59.977s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

CrewAI is already tracking how many tokens we are using during the crew's session so hopefully this wouldn't be too large of a lift.

bhancockio avatar Apr 23 '24 02:04 bhancockio

Hi, Did you try using the Max RPM (optional) | Maximum requests per minute the crew adheres to during execution. https://docs.crewai.com/core-concepts/Crews/#crew-attributes Also https://docs.crewai.com/core-concepts/Agents/#what-is-an-agent

gadgethome avatar Apr 23 '24 10:04 gadgethome

Hey Paul!

Good suggestion! I did adjust the crew's RPM to 5 and I was able to get the crew to run. However, things were super slow and the crew still hit rate limit issues.

I think Tokens Per Minute would make for a great addition to the crew because Requests Per Minute is not the same as Tokens Per Minute.

Here's the problem with the RPM current approach: I have zero control over the token size for each request as a developer.

So, even if I set the RPM of a crew to 10. The token size of those 10 requests could be drastically different.

For example, if each request is 500 tokens, I will use 5K tokens per minute which would put me at the limit for Groq.

However, if each requests is 2K tokens, I will use 20K tokens per minute which would put me way over the Groq limit and cause my crew to crash.

bhancockio avatar Apr 23 '24 13:04 bhancockio

Agreed. For LLMs (eg Groq) that limits TPM, max_rpm does not provide enough control. You can still run into their limits even with a tiny RPM. Something like max_tpm would be a good addition. Developers can choose either one or both depending on the LLMs

ranzhang avatar Apr 23 '24 19:04 ranzhang

Agreed as well.

Seneko avatar May 07 '24 14:05 Seneko

I hope that this gets added soon 🤞

jeeanribeiro avatar May 07 '24 23:05 jeeanribeiro

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 17 '24 12:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Aug 22 '24 12:08 github-actions[bot]