litellm
litellm copied to clipboard
usage-based-routing-ttl-on-cache
RPM/TPM counter cache using a 1-minute window policy. It is better to put the TTL for that counter key.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
litellm | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | May 14, 2024 4:36am |
the ttl for usage is 1min because the TPM/RPM limits are across 1 minute @sumanth13131
can you share a repro of the problem you're facing?
bump on this? @sumanth13131
Hi @krrishdholakia,
I attached the Screenshot below.
Currently, all keys are set by usage-based-strategy
as default TTL(-1), which means no TTL until Redis LRU Cache evicts.
PS: The same key can be possible by the next day also. This causes no deployment available
if that time frame quota(rpm/tpm) is already exhausted.
@sumanth13131 so if i understand the problem
- keys can have the same name +24 hours later. This means incorrect tpm/rpm values are used?
Can we fix this by having a more precise key name? including the current date as part of it
we do this for lowest latency routing -
https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L325
Putting TTL is fine right? Please check this code change https://github.com/BerriAI/litellm/pull/3412/files
You're right. But instead of hardcoding it, can we have it be a controllable param
Like in lowest latency routing - https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L31
Then in the test, let's have it set to a very low amount (5s?) -> and check if it's set + evicted within the expected timeframe
Sure
Hi @krrishdholakia, I've updated the PR as we talked about.