litellm [Bug]: 404 when trying to connect to oobabooga/text-generation-webui

What happened?

I'm having difficulty correctly configuring LiteLLM. I am getting the following error:

litellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}

I have oobabooga/text-generation-webui running on 192.168.1.4 with the openai-compatible API on port 5000. This works:

$ curl -s http://192.168.1.4:5000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }' | jq .
{
  "id": "chatcmpl-1704940905826089216",
  "object": "chat.completions",
  "created": 1704940905,
  "model": "turboderp_Mixtral-8x7B-instruct-exl2",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Hello! It's nice to meet you. Is there something specific you would like to talk about or any questions you have in mind? I'm here to help with any information you might need about artificial intelligence, data science, or related topics."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 52,
    "total_tokens": 62
  }
}

However, this same call to LiteLLM fails. I have it running on a server located at llm.jrruethe.info (this is internal-only):

$ curl -s https://llm.jrruethe.info/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }' | jq .
{
  "detail": "OpenAIException - Error code: 404 - {'detail': 'Not Found'}\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.9/site-packages/litellm/main.py\", line 210, in acompletion\n    response = await init_response\n  File \"/usr/local/lib/python3.9/site-packages/litellm/llms/openai.py\", line 405, in acompletion\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/litellm/llms/openai.py\", line 390, in acompletion\n    response = await openai_aclient.chat.completions.create(\n  File \"/usr/local/lib/python3.9/site-packages/openai/resources/chat/completions.py\", line 1291, in create\n    return await self._post(\n  File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1578, in post\n    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n  File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1339, in request\n    return await self._request(\n  File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1429, in _request\n    raise self._make_status_error_from_response(err.response) from None\nopenai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.9/site-packages/litellm/proxy/proxy_server.py\", line 1452, in chat_completion\n    response = await llm_router.acompletion(**data)\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 320, in acompletion\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 316, in acompletion\n    response = await self.async_function_with_fallbacks(**kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 871, in async_function_with_fallbacks\n    raise original_exception\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 799, in async_function_with_fallbacks\n    response = await self.async_function_with_retries(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 930, in async_function_with_retries\n    raise original_exception\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 888, in async_function_with_retries\n    response = await original_function(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 382, in _acompletion\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 361, in _acompletion\n    response = await litellm.acompletion(\n  File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 2366, in wrapper_async\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 2258, in wrapper_async\n    result = await original_function(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/litellm/main.py\", line 227, in acompletion\n    raise exception_type(\n  File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 6628, in exception_type\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 5586, in exception_type\n    raise NotFoundError(\nlitellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}\n"
}

Below is my LiteLLM config.yaml:

model_list:
- model_name: gpt-3.5-turbo
  litellm_params:
    model: openai/gpt-3.5-turbo
    custom_llm_provider: openai
    api_base: http://192.168.1.4:5000
    api_key: dummy

As far as I am aware, oobabooga/tgi doesn't actually care about the model names, so I'm just using gpt-3.5-turbo, even though I currently have the Mixtral model loaded.

I'm running the docker image litellm/litellm:v1.17.0 with the command :

["litellm", "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "1", "--detailed_debug"]

(I've got my reverse proxy handling https and the port mapping)

Here is what it shows on startup:

[2024-01-11 02:03:10 +0000] [1] [INFO] Starting gunicorn 21.2.0
[2024-01-11 02:03:10 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2024-01-11 02:03:10 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-01-11 02:03:10 +0000] [7] [INFO] Booting worker with pid: 7
[2024-01-11 02:03:10 +0000] [7] [INFO] Started server process [7]
[2024-01-11 02:03:10 +0000] [7] [INFO] Waiting for application startup.
LiteLLM Proxy - DEBUG: Loaded config YAML (api_key and environment_variables are not shown):
{
  "model_list": [
    {
      "model_name": "gpt-3.5-turbo",
      "litellm_params": {
        "model": "openai/gpt-3.5-turbo",
        "custom_llm_provider": "openai",
        "api_base": "http://192.168.1.4:5000",
        "api_key": "dummy"
      }
    }
  ]
}
LiteLLM Router - DEBUG: Initializing OpenAI Client for openai/gpt-3.5-turbo, Api Base:http://192.168.1.4:5000, Api Key:dummy
LiteLLM Router - DEBUG: 
Initialized Model List [{'model_name': 'gpt-3.5-turbo', 'litellm_params': {'model': 'openai/gpt-3.5-turbo', 'custom_llm_provider': 'openai', 'api_base': 'http://192.168.1.4:5000', 'api_key': 'dummy'}, 'model_info': {'id': 'b727b6c3-9418-4614-b05e-2f014c3c9f38'}}]
LiteLLM Router - DEBUG: Intialized router with Routing strategy: simple-shuffle

LiteLLM Proxy - DEBUG: prisma client - None
[2024-01-11 02:03:10 +0000] [7] [INFO] Application startup complete.

And here are the outputs of some curls calls to show that the config is set as expected:

curl -X 'GET' \
  'https://llm.jrruethe.info/v1/model/info' \
  -H 'accept: application/json'

{
  "data": [
    {
      "model_name": "gpt-3.5-turbo",
      "litellm_params": {
        "model": "openai/gpt-3.5-turbo",
        "custom_llm_provider": "openai",
        "api_base": "http://192.168.1.4:5000"
      },
      "model_info": {}
    }
  ]
}

What am I doing wrong? I've looked through all the documentation I can find, but nothing is standing out.

Relevant log output

File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 316, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 871, in async_function_with_fallbacks
    raise original_exception
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 799, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 930, in async_function_with_retries
    raise original_exception
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 888, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 382, in _acompletion
    raise e
  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 361, in _acompletion
    response = await litellm.acompletion(
  File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2366, in wrapper_async
    raise e
  File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2258, in wrapper_async
    result = await original_function(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/litellm/main.py", line 227, in acompletion
    raise exception_type(
  File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 6628, in exception_type
    raise e
  File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 5586, in exception_type
    raise NotFoundError(
litellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}
LiteLLM Proxy - DEBUG: An error occurred: OpenAIException - Error code: 404 - {'detail': 'Not Found'}

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
LiteLLM Proxy - DEBUG: Results from router
LiteLLM Proxy - DEBUG: 
Router stats
LiteLLM Proxy - DEBUG: 
Total Calls made
LiteLLM Proxy - DEBUG: openai/gpt-3.5-turbo: 1
LiteLLM Proxy - DEBUG: 
Success Calls made
LiteLLM Proxy - DEBUG: 
Fail Calls made
LiteLLM Proxy - DEBUG: openai/gpt-3.5-turbo: 1

Twitter / LinkedIn details

No response

Jan 11 '24 02:01 jrruethe

Any idea of what I have configured incorrectly?

Jan 22 '24 02:01 jrruethe

hi @jrruethe

Can we hop on a call and help get you setup with litellm proxy Sharing my calendly for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Feb 10 '24 21:02 ishaan-jaff

@jrruethe and @ishaan-jaff

I am facing the same issue. Did you guys find any solutions for this one??

Feb 12 '24 10:02 shailja-imw

I came back to this after some time away, and for some reason the following works for me now, I'm not exactly sure why this wasn't working before:

    model_list:
    - model_name: gpt-3.5-turbo
      litellm_params:
        model: openai/gpt-3.5-turbo
        api_base: http://192.168.1.4:5000/v1
        api_key: dummy

Feb 15 '24 19:02 jrruethe

@jrruethe thanks for the update - would you be free for a quick call? I'd love to learn how we can make litellm proxy 10x better for your use case: My calendly is here for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

@shailja-imw I can debug this with you over a call, My calendly is here for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Feb 15 '24 20:02 ishaan-jaff

@ishaan-jaff It is Working from my side, I had taken the latest pull and now it's working fine. Thanks

Feb 19 '24 04:02 shailja-imw

litellm litellm copied to clipboard

[Bug]: 404 when trying to connect to oobabooga/text-generation-webui

What happened?

Relevant log output

Twitter / LinkedIn details

litellm
litellm copied to clipboard