litellm icon indicating copy to clipboard operation
litellm copied to clipboard

usage-based-routing-ttl-on-cache

Open sumanth13131 opened this issue 9 months ago • 9 comments

RPM/TPM counter cache using a 1-minute window policy. It is better to put the TTL for that counter key.

sumanth13131 avatar May 03 '24 05:05 sumanth13131

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2024 4:36am

vercel[bot] avatar May 03 '24 05:05 vercel[bot]

the ttl for usage is 1min because the TPM/RPM limits are across 1 minute @sumanth13131

can you share a repro of the problem you're facing?

krrishdholakia avatar May 10 '24 17:05 krrishdholakia

bump on this? @sumanth13131

krrishdholakia avatar May 11 '24 16:05 krrishdholakia

Hi @krrishdholakia, I attached the Screenshot below. Currently, all keys are set by usage-based-strategy as default TTL(-1), which means no TTL until Redis LRU Cache evicts.

PS: The same key can be possible by the next day also. This causes no deployment available if that time frame quota(rpm/tpm) is already exhausted.

Screenshot 2024-05-11 at 10 23 43 PM

sumanth13131 avatar May 11 '24 16:05 sumanth13131

@sumanth13131 so if i understand the problem

  • keys can have the same name +24 hours later. This means incorrect tpm/rpm values are used?

Can we fix this by having a more precise key name? including the current date as part of it

we do this for lowest latency routing -

https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L325

krrishdholakia avatar May 11 '24 17:05 krrishdholakia

Putting TTL is fine right? Please check this code change https://github.com/BerriAI/litellm/pull/3412/files

sumanth13131 avatar May 11 '24 17:05 sumanth13131

You're right. But instead of hardcoding it, can we have it be a controllable param

Like in lowest latency routing - https://github.com/BerriAI/litellm/blob/d33e49411d6503cb634f9652873160cd534dec96/litellm/router_strategy/lowest_latency.py#L31

Then in the test, let's have it set to a very low amount (5s?) -> and check if it's set + evicted within the expected timeframe

krrishdholakia avatar May 11 '24 17:05 krrishdholakia

Sure

sumanth13131 avatar May 11 '24 17:05 sumanth13131

Hi @krrishdholakia, I've updated the PR as we talked about.

sumanth13131 avatar May 14 '24 04:05 sumanth13131