litellm
litellm copied to clipboard
[Bug]: Invalid model name error when calling openai/deployments/<model_name>/chat/completions
What happened?
Hi!
we've noticed that since version v1.34.16
calling openai/deployment/<model_name>/chat/completions
started failing with:
{"error":{"message":"400: {'error': 'Invalid model name passed in'}","type":"None","param":"None","code":400}}
Latest version not returning error was v1.34.14
Example calls:
- working, v1.34.14
curl -X 'POST' \
'https://<endpoint>/openai/deployments/gpt-35-turbo/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{"messages": [{"role": "user", "content": "\"repeat laa 3 times as json\""}], "max_tokens": 4000, "temperature": 0, "seed": 0}'
response:
{"id":"chatcmpl-9CR0nRDFLNpYISVTEXxXjMdcQwMpr","choices":[{"finish_reason":"stop","index":0,"message":{"content":"{\n \"repeated_word\": \"laa\",\n \"repetitions\": 3\n}","role":"assistant"}}],"created":1712750797,"model":"gpt-35-turbo","object":"chat.completion","system_fingerprint":"fp_2f57f81c11","usage":{"completion_tokens":21,"prompt_tokens":17,"total_tokens":38}}
- not working v1.34.39 (currently latest)
curl -X 'POST' \
'https://<endpoint>/openai/deployments/gpt-35-turbo/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{"messages": [{"role": "user", "content": "\"repeat laa 3 times as json\""}], "max_tokens": 4000, "temperature": 0, "seed": 0}'
response:
{"error":{"message":"400: {'error': 'Invalid model name passed in'}","type":"None","param":"None","code":400}}
API docs:
Relevant log output
2024-04-10T13:46:05+02:00 Traceback (most recent call last):
2024-04-10T13:46:05+02:00 File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3285, in completion
2024-04-10T13:46:05+02:00 raise HTTPException(
2024-04-10T13:46:05+02:00 fastapi.exceptions.HTTPException: 400: {'error': 'Invalid model name passed in'}
2024-04-10T13:46:05+02:00 INFO: 192.168.15.71:49914 - "POST /openai/deployments/gpt-35-turbo/chat/completions HTTP/1.1" 400 Bad Request
Twitter / LinkedIn details
No response
@gagarinfan this looks like your proxy did not start with the config.yaml,
- do you use a run CMD to start the proxy ?
- do you pass
--config
to it ?
Bump @gagarinfan ?
Hey, router starts properly with the config. I see listed models in logs. The only thing I've change is a docker image.
Hmm, I ran some tests using AzureOpenAI
and looks like there is also a drift for it between LiteLLM versions.
code (from https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python-new#work-with-the-gpt-35-turbo-and-gpt-4-models)
client = openai.AzureOpenAI(
api_key=""
azure_endpoint="",
api_version="2024-02-01"
)
# Send a completion call to generate an answer
response = client.chat.completions.create(
model="gpt-35-turbo", # model = "deployment_name".
messages=[
{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
)
print(response)
response in v1.34.39
:
openai.BadRequestError: Error code: 400 - {'error': {'message': "400: {'error': 'Invalid model name passed in'}", 'type': 'None', 'param': 'None', 'code': 400}}
response in v1.34.12
(currently deployed on our premises):
ChatCompletion(id='chatcmpl-9D5x8GgAZCseFDTZSOUj9kB8jSLfr', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Microsoft was co-founded by Bill Gates and Paul Allen. They founded the company in 1975.', role='assistant', function_call=None, tool_calls=None))], created=1712908174, model='gpt-35-turbo', object='chat.completion', system_fingerprint='fp_2f57f81c11', usage=CompletionUsage(completion_tokens=20, prompt_tokens=29, total_tokens=49))
Please note that when using OpenAI
library it works fine for both versions:
client = openai.OpenAI(
api_key=""
base_url=""
)
response = client.chat.completions.create(
model="gpt-35-turbo",
messages=[
{"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
{"role": "user", "content": "Who were the founders of Microsoft?"}
]
)
print(response)
and response:
ChatCompletion(id='chatcmpl-9D5zkJzMSxjyPSHuT3dfEKlPvTQdC', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Microsoft was founded by Bill Gates and Paul Allen in 1975. They were both students at Lakeside School in Seattle, Washington when they first collaborated to write software for the Altair 8800 microcomputer. This collaboration eventually led to the founding of Microsoft.', role='assistant', function_call=None, tool_calls=None))], created=1712908336, model='gpt-35-turbo', object='chat.completion', system_fingerprint='fp_2f57f81c11', usage=CompletionUsage(completion_tokens=53, prompt_tokens=29, total_tokens=82))
I'm not sure if this will solve your specific issue, but for anyone else getting this error after recently updating the Docker image version, make sure to add a command
field to docker-compose.yml
. Specifically the --config
argument within the field. Here's an example: https://github.com/BerriAI/litellm/blob/main/docker-compose.yml#L14.
It seems the latest Docker image doesn't load the LiteLLM config file by default, or maybe it does, but it's at a new default path.
Thanks @hi019, but in my case config is being loaded. I see listed models in the log when the app starts plus, as I wrote in my previous message, I was able to call the model using OpenAI
I am also facing this issue, I am not sure how to solve this issue. Looking for helps
Here is my config file for two LLMs
model_list:
# Azure OpenAI Models
- model_name: gpt-4
litellm_params:
model: azure/my-gpt4-deployment
api_base: https://my-openai.openai.azure.com/
api_key: my-key
api_version: "2023-05-15"
timeout: 60 # timeout in (seconds)
stream_timeout: 0.01 # timeout for stream requests (seconds)
max_retries: 1
model_info:
base_model: azure/gpt-4
# OpenAI Models
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: my-openai-key
As you can see I am using gpt-4
model for both, I don't want to enable routing for now, so I need to specify which provider to run the completion
function.
response = litellm.completion(
# model="openai/gpt-4", # This only route to OpenAI and it works
# model="gpt-4", # It works, but it performs load balancing between both LLMs. I need to disable load balancing/routing
model="azure/my-gpt4-deployment", # This doesn't work and through model invalid error
api_base = "http://localhost:4000",
api_version = "2023-05-15",
api_key = "sk-kVCC19uFbYVpvRS857WaNQ",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
print(response)
It seems when using azure/deployment_name
doesn't work and I am getting below error
litellm.exceptions.APIError: AzureException - Error code: 400 - {'error': {'message': "400: {'error': 'Invalid model name passed in'}", 'type': 'None', 'param': 'None', 'code': 400}}
@hoang-innomize FYI I think you leaked your key
@hoang-innomize how do you start the proxy ? Can I see the RUN CMD you're using
@ishaan-jaff here is my docker-compose file
version: '3.9'
services:
litellm:
build:
context: .
args:
target: runtime
image: ghcr.io/berriai/litellm:main-latest
# depends_on:
# - dbpostgresql
environment:
- UI_USERNAME=admin
- UI_PASSWORD=admin
ports:
- "4000:4000" # Map the container port to the host, change the host port if necessary
volumes:
- ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
# You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "2" ]
I have two models on my list
Another issue that I also faced is the models when creating users, it seems we are getting the public model names instead of litellm models. That is the reason why I can only see one model
@ishaan-jaff after running some more tasks, I have noticed that even I use openai/gpt-4
it still performs the load balancing. So how can we explicit specify the LLM provider for this case as I wanted to be static indication by coding not using load balancing. In other words, based on the config file, which params we need to use to
- Only call OpenAI
- Only call Azure OpenAI
to only call specific model in the list - just specify the litellm model name
you can see our logic for this here - https://github.com/BerriAI/litellm/blob/180718c33f5e688b24098155b92149e862e9935a/litellm/proxy/proxy_server.py#L3715