Use litellm Router for rate limiting and/or fallback LLMs
Summary
Litellm has the Router class that encapsulates completion with rate limits handling. We can look into using it, because it should allow us to define a RetryPolicy hopefully based on how long the provider has left (though in my reading, it doesn't yet). It does allow to define a fall back LLM in case one provider runs out of tries. (https://github.com/All-Hands-AI/OpenHands/issues/1263)
Rate limit headers for OpenAI: https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers
Rate limit headers for Anthropic: https://docs.anthropic.com/en/api/rate-limits#response-headers
Technical Design
Replace completion direct call to litellm with Router.completion
Alternatives to Consider
Continue to do it ourselves. Various providers have different rate limits, so our options are:
- don't get the remaining time, and think again of some sensible defaults, user-configurable; better documentation
- get the remaining time from liteLLM
Fall back LLM:
- do it ourselves
- configure litellm
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.
Bouncing this back to here https://github.com/All-Hands-AI/OpenHands/issues/4184