litellm [Bug]: Invalid model name error when calling openai/deployments/<model

What happened?

Hi!

we've noticed that since version v1.34.16 calling openai/deployment/<model_name>/chat/completions started failing with:

{"error":{"message":"400: {'error': 'Invalid model name passed in'}","type":"None","param":"None","code":400}}

Latest version not returning error was v1.34.14

Example calls:

working, v1.34.14

curl -X 'POST' \
  'https://<endpoint>/openai/deployments/gpt-35-turbo/chat/completions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"messages": [{"role": "user", "content": "\"repeat laa 3 times as json\""}], "max_tokens": 4000, "temperature": 0, "seed": 0}'

response:

{"id":"chatcmpl-9CR0nRDFLNpYISVTEXxXjMdcQwMpr","choices":[{"finish_reason":"stop","index":0,"message":{"content":"{\n  \"repeated_word\": \"laa\",\n  \"repetitions\": 3\n}","role":"assistant"}}],"created":1712750797,"model":"gpt-35-turbo","object":"chat.completion","system_fingerprint":"fp_2f57f81c11","usage":{"completion_tokens":21,"prompt_tokens":17,"total_tokens":38}}

not working v1.34.39 (currently latest)

curl -X 'POST' \
  'https://<endpoint>/openai/deployments/gpt-35-turbo/chat/completions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"messages": [{"role": "user", "content": "\"repeat laa 3 times as json\""}], "max_tokens": 4000, "temperature": 0, "seed": 0}'

response:

{"error":{"message":"400: {'error': 'Invalid model name passed in'}","type":"None","param":"None","code":400}}

API docs:

Relevant log output

2024-04-10T13:46:05+02:00 Traceback (most recent call last):
2024-04-10T13:46:05+02:00   File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3285, in completion
2024-04-10T13:46:05+02:00     raise HTTPException(
2024-04-10T13:46:05+02:00 fastapi.exceptions.HTTPException: 400: {'error': 'Invalid model name passed in'}
2024-04-10T13:46:05+02:00 INFO:     192.168.15.71:49914 - "POST /openai/deployments/gpt-35-turbo/chat/completions HTTP/1.1" 400 Bad Request

Twitter / LinkedIn details

No response

Apr 10 '24 12:04 gagarinfan

@gagarinfan this looks like your proxy did not start with the config.yaml,

do you use a run CMD to start the proxy ?
do you pass --config to it ?

Apr 10 '24 15:04 ishaan-jaff

Bump @gagarinfan ?

Apr 11 '24 16:04 ishaan-jaff

Hey, router starts properly with the config. I see listed models in logs. The only thing I've change is a docker image.

Apr 12 '24 07:04 gagarinfan

Hmm, I ran some tests using AzureOpenAI and looks like there is also a drift for it between LiteLLM versions.

code (from https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python-new#work-with-the-gpt-35-turbo-and-gpt-4-models)

client = openai.AzureOpenAI(
    api_key=""
    azure_endpoint="",
    api_version="2024-02-01"
    )

# Send a completion call to generate an answer
response = client.chat.completions.create(
    model="gpt-35-turbo", # model = "deployment_name".
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": "Who were the founders of Microsoft?"}
    ]
)
print(response)

response in v1.34.39:

openai.BadRequestError: Error code: 400 - {'error': {'message': "400: {'error': 'Invalid model name passed in'}", 'type': 'None', 'param': 'None', 'code': 400}}

response in v1.34.12 (currently deployed on our premises):

ChatCompletion(id='chatcmpl-9D5x8GgAZCseFDTZSOUj9kB8jSLfr', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Microsoft was co-founded by Bill Gates and Paul Allen. They founded the company in 1975.', role='assistant', function_call=None, tool_calls=None))], created=1712908174, model='gpt-35-turbo', object='chat.completion', system_fingerprint='fp_2f57f81c11', usage=CompletionUsage(completion_tokens=20, prompt_tokens=29, total_tokens=49))

Please note that when using OpenAI library it works fine for both versions:

client = openai.OpenAI(
    api_key=""
    base_url=""
)

response = client.chat.completions.create(
    model="gpt-35-turbo",
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": "Who were the founders of Microsoft?"}
    ]
)

print(response)

and response:

ChatCompletion(id='chatcmpl-9D5zkJzMSxjyPSHuT3dfEKlPvTQdC', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Microsoft was founded by Bill Gates and Paul Allen in 1975. They were both students at Lakeside School in Seattle, Washington when they first collaborated to write software for the Altair 8800 microcomputer. This collaboration eventually led to the founding of Microsoft.', role='assistant', function_call=None, tool_calls=None))], created=1712908336, model='gpt-35-turbo', object='chat.completion', system_fingerprint='fp_2f57f81c11', usage=CompletionUsage(completion_tokens=53, prompt_tokens=29, total_tokens=82))

Apr 12 '24 07:04 gagarinfan

I'm not sure if this will solve your specific issue, but for anyone else getting this error after recently updating the Docker image version, make sure to add a command field to docker-compose.yml. Specifically the --config argument within the field. Here's an example: https://github.com/BerriAI/litellm/blob/main/docker-compose.yml#L14.

It seems the latest Docker image doesn't load the LiteLLM config file by default, or maybe it does, but it's at a new default path.

Apr 13 '24 00:04 hi019

Thanks @hi019, but in my case config is being loaded. I see listed models in the log when the app starts plus, as I wrote in my previous message, I was able to call the model using OpenAI

Apr 15 '24 07:04 gagarinfan

I am also facing this issue, I am not sure how to solve this issue. Looking for helps

Here is my config file for two LLMs

model_list:
  # Azure OpenAI Models
  - model_name: gpt-4
    litellm_params:
      model: azure/my-gpt4-deployment
      api_base: https://my-openai.openai.azure.com/
      api_key: my-key
      api_version: "2023-05-15"
      timeout: 60                      # timeout in (seconds)
      stream_timeout: 0.01              # timeout for stream requests (seconds)
      max_retries: 1
    model_info:
      base_model: azure/gpt-4
  # OpenAI Models
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: my-openai-key

As you can see I am using gpt-4 model for both, I don't want to enable routing for now, so I need to specify which provider to run the completion function.

response = litellm.completion(
            # model="openai/gpt-4", # This only route to OpenAI and it works
            # model="gpt-4", # It works, but it performs load balancing between both LLMs. I need to disable load balancing/routing
            model="azure/my-gpt4-deployment", # This doesn't work and through model invalid error
            api_base = "http://localhost:4000",
            api_version = "2023-05-15",
            api_key = "sk-kVCC19uFbYVpvRS857WaNQ",
            messages=[{ "content": "Hello, how are you?","role": "user"}]
        )

print(response)

It seems when using azure/deployment_name doesn't work and I am getting below error

litellm.exceptions.APIError: AzureException - Error code: 400 - {'error': {'message': "400: {'error': 'Invalid model name passed in'}", 'type': 'None', 'param': 'None', 'code': 400}}

Apr 18 '24 04:04 hoang-innomize

@hoang-innomize FYI I think you leaked your key

@hoang-innomize how do you start the proxy ? Can I see the RUN CMD you're using

Apr 18 '24 04:04 ishaan-jaff

@ishaan-jaff here is my docker-compose file

version: '3.9'
services:
  litellm:
    build:
      context: .
      args:
        target: runtime
    image: ghcr.io/berriai/litellm:main-latest
    # depends_on:
    #   - dbpostgresql
    environment:
      - UI_USERNAME=admin
      - UI_PASSWORD=admin
    ports:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    volumes:
      - ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
    # You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
    command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "2" ]

I have two models on my list

Another issue that I also faced is the models when creating users, it seems we are getting the public model names instead of litellm models. That is the reason why I can only see one model

Apr 18 '24 04:04 hoang-innomize

@ishaan-jaff after running some more tasks, I have noticed that even I use openai/gpt-4 it still performs the load balancing. So how can we explicit specify the LLM provider for this case as I wanted to be static indication by coding not using load balancing. In other words, based on the config file, which params we need to use to

Only call OpenAI
Only call Azure OpenAI

Apr 18 '24 04:04 hoang-innomize

to only call specific model in the list - just specify the litellm model name

you can see our logic for this here - https://github.com/BerriAI/litellm/blob/180718c33f5e688b24098155b92149e862e9935a/litellm/proxy/proxy_server.py#L3715

Apr 26 '24 23:04 krrishdholakia

litellm
litellm copied to clipboard

[Bug]: Invalid model name error when calling openai/deployments/<model_name>/chat/completions

What happened?

Relevant log output

Twitter / LinkedIn details

litellm litellm copied to clipboard

[Bug]: Invalid model name error when calling openai/deployments/<model_name>/chat/completions

What happened?

Relevant log output

Twitter / LinkedIn details

litellm
litellm copied to clipboard