litellm
litellm copied to clipboard
[Bug]: Azure "api_version" not respected when provided from client side
What happened?
Hi! First of all - I really appreciate your work. OpenAI Server/ LiteLLM seems to be a great solution!
We are using OpenAI Proxy server to route to the different Azure OpenAI deployments. We have discovered that when specifying api_version
from the client's side it's not respected during the call.
OpenAI python package version: Version: 1.12.0
OpenAI Proxy Server version: v1.30.0
Example code:
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="dummy",
# I want to use a specific api_version, other than default 2023-07-01-preview
api_version="2023-05-15",
# OpenAI Proxy Endpoint
azure_endpoint="https://openai-proxy.domain.com"
)
response = client.chat.completions.create(
model="gpt-35-turbo-16k-qt",
messages=[
{"role": "user", "content": "Some content"}
],
)
part of OpenAI Proxy Server config:
model_list:
- litellm_params:
api_base: https://endpoint-name.openai.azure.com
api_key: os.environ/API_KEY_OPENAI
model: azure/gpt-35-turbo-16k-qt
model_name: gpt-35-turbo-16k-qt
During the load tests, when a rate limit is crossed the error code states the 2023-07-01-preview
has been called
litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO: 192.168.8.15:49752 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests
interestingly one can notice completions?api-version=2023-05-15
in parameters.
However, when specifying api_version
in proxy server it works:
model_list:
- litellm_params:
api_base: https://endpoint-name.openai.azure.com
api_key: os.environ/API_KEY_OPENAI
model: azure/gpt-35-turbo-16k-qt
api_version: "2023-05-15" # ---> it is respected when setting api_version here
model_name: gpt-35-turbo-16k-qt
then rate limit error contains proper api_version
:
litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO: 192.168.14.51:37850 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests
Looks like api_version
is not respected by the proxy server. When setting api_version="2023-07-01-preview"
but api_version: "2023-05-15"
the error relates to the one set in proxy config:
litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO: 192.168.14.51:53812 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-07-01-preview HTTP/1.1" 429 Too Many Requests
Thanks in advance!
Relevant log output
litellm.exceptions.RateLimitError: AzureException - Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 10 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
INFO: 192.168.8.15:49752 - "POST /openai/deployments/gpt-35-turbo-16k-qt/chat/completions?api-version=2023-05-15 HTTP/1.1" 429 Too Many Requests