litellm usage-based-routing-ttl-on-cache

RPM/TPM counter cache using a 1-minute window policy. It is better to put the TTL for that counter key.

May 03 '24 05:05 sumanth13131

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 14, 2024 4:36am

May 03 '24 05:05 vercel[bot]

the ttl for usage is 1min because the TPM/RPM limits are across 1 minute @sumanth13131

can you share a repro of the problem you're facing?

May 10 '24 17:05 krrishdholakia

bump on this? @sumanth13131

May 11 '24 16:05 krrishdholakia

Hi @krrishdholakia, I attached the Screenshot below. Currently, all keys are set by usage-based-strategy as default TTL(-1), which means no TTL until Redis LRU Cache evicts.

PS: The same key can be possible by the next day also. This causes no deployment available if that time frame quota(rpm/tpm) is already exhausted.

May 11 '24 16:05 sumanth13131

@sumanth13131 so if i understand the problem

keys can have the same name +24 hours later. This means incorrect tpm/rpm values are used?

Can we fix this by having a more precise key name? including the current date as part of it

we do this for lowest latency routing -

https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L325

May 11 '24 17:05 krrishdholakia

Putting TTL is fine right? Please check this code change https://github.com/BerriAI/litellm/pull/3412/files

May 11 '24 17:05 sumanth13131

You're right. But instead of hardcoding it, can we have it be a controllable param

Like in lowest latency routing - https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L31

Then in the test, let's have it set to a very low amount (5s?) -> and check if it's set + evicted within the expected timeframe

May 11 '24 17:05 krrishdholakia

Sure

May 11 '24 17:05 sumanth13131

Hi @krrishdholakia, I've updated the PR as we talked about.

May 14 '24 04:05 sumanth13131

litellm litellm copied to clipboard

usage-based-routing-ttl-on-cache

litellm
litellm copied to clipboard