litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Bug]: Azure "api_version" not respected when provided from client side

Open gagarinfan opened this issue 11 months ago • 4 comments

What happened?

Hi! First of all - I really appreciate your work. OpenAI Server/ LiteLLM seems to be a great solution!

We are using OpenAI Proxy server to route to the different Azure OpenAI deployments. We have discovered that when specifying api_version from the client's side it's not respected during the call.

OpenAI python package version: Version: 1.12.0 OpenAI Proxy Server version: v1.30.0

Example code:

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="dummy",
    # I want to use a specific api_version, other than default 2023-07-01-preview
    api_version="2023-05-15",
    # OpenAI Proxy Endpoint
    azure_endpoint="https://openai-proxy.domain.com"
    )

response = client.chat.completions.create(
    model="gpt-35-turbo-16k-qt",
    messages=[
        {"role": "user", "content": "Some content"}
    ],
)

part of OpenAI Proxy Server config:

model_list:
- litellm_params:
    api_base: https://endpoint-name.openai.azure.com
    api_key: os.environ/API_KEY_OPENAI
    model: azure/gpt-35-turbo-16k-qt
  model_name: gpt-35-turbo-16k-qt

During the load tests, when a rate limit is crossed the error code states the 2023-07-01-preview has been called

litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO:     192.168.8.15:49752 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests

interestingly one can notice completions?api-version=2023-05-15 in parameters.

However, when specifying api_version in proxy server it works:

model_list:
- litellm_params:
    api_base: https://endpoint-name.openai.azure.com
    api_key: os.environ/API_KEY_OPENAI
    model: azure/gpt-35-turbo-16k-qt
    api_version: "2023-05-15" # ---> it is respected when setting api_version here
  model_name: gpt-35-turbo-16k-qt

then rate limit error contains proper api_version:

litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO:     192.168.14.51:37850 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests

Looks like api_version is not respected by the proxy server. When setting api_version="2023-07-01-preview" but api_version: "2023-05-15" the error relates to the one set in proxy config:

litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO:     192.168.14.51:53812 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-07-01-preview HTTP/1.1" 429 Too Many Requests

Thanks in advance!

Relevant log output

litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO:     192.168.8.15:49752 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests

gagarinfan avatar Mar 07 '24 11:03 gagarinfan