litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Feature]: New Model - Azure PTUs

Open ishaan-jaff opened this issue 1 year ago • 7 comments

The Feature

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput

  • support basic completion/embedding
  • cost tracking for azure ptus completion/embedding

ishaan-jaff avatar Feb 17 '24 01:02 ishaan-jaff

@ishaan-jaff @krrishdholakia I have a mix of PTUs and PAYG models for the exact same model variant, it'd be great if I could give a priority in the litellm_params so that the router ensures to pick the PTU instances before others, or just a flag in litellm_params indicating this model is a PTU vs. not and router should prioritize picking the PTU first. The tpm/rpm in litellm_params is not going to work here as PTUs don't have set tpm/rpm on them. I think we need an updated simple-shuffle routing strategy where you'd first pick randomly from all PTUs and only if all PTUs have failed then pick randomly from non-PTUs.

taralika avatar Apr 29 '24 20:04 taralika

@taralika why not give the PTUs an arbitrary higher rpm/tpm so they're picked more often

from litellm import Router 
import asyncio

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/PTU_MODEL, # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE"),
        "rpm": 9000,         # requests per minute for this API
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/regular", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE"),
        "rpm": 10,
    }
}
}]

# init router
router = Router(model_list=model_list, routing_strategy="simple-shuffle")

ishaan-jaff avatar Apr 29 '24 20:04 ishaan-jaff

in simple-shuffle we only use the rpm/tpm to define the how often they should be picked

I'm open to adding a PTU flag but would love to understand why it should exist

ishaan-jaff avatar Apr 29 '24 20:04 ishaan-jaff

in simple-shuffle we only use the rpm/tpm to define the how often they should be picked

can you share a bit more about this logic? I'm open to using rpm and giving it an arbitrarily high number (like 9000 in your example) for PTU models, however, it'd be great if I don't have to go and set a "low" rpm (like 10 in your example) on every single non-PTU model.

taralika avatar Apr 29 '24 21:04 taralika

@taralika we perform a weighted pick: https://github.com/BerriAI/litellm/blob/de3e642999b13ac855e4ce5d77a2af45bd9a5d39/litellm/router.py#L2958

So you would not need to set a "low" rpm on every non PTU

I'd love to setup a support channel and learn how we can improve litellm for you, would you bee free for a call sometime this week? what's the best email to setup a call?

if it's easier here's a link to my cal https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version?month=2024-04

ishaan-jaff avatar Apr 29 '24 21:04 ishaan-jaff

this makes sense, thank you so much for the prompt response! I'll schedule something to connect further.

taralika avatar Apr 29 '24 21:04 taralika

@ishaan-jaff

https://github.com/BerriAI/litellm/blob/de3e642999b13ac855e4ce5d77a2af45bd9a5d39/litellm/router.py#L2958

Am I reading this code correctly that any model whose rpm is set needs to be at the beginning of the list in the proxy-config.yaml? otherwise it doesn't use the rpm until healthy_deployments[0] has rpm listed?

So instead of checking only in the first deployment, like the code does today: rpm = healthy_deployments[0].get("litellm_params").get("rpm", None) might it not be better to do something like this to check all deployments, so that the model order doesn't matter: rpm = any(deployment.get("litellm_params", {}).get("rpm") is not None for deployment in healthy_deployments)?

taralika avatar Apr 29 '24 22:04 taralika

closing since we support this

ishaan-jaff avatar May 24 '24 01:05 ishaan-jaff