litellm
litellm copied to clipboard
[Bug]: 404 when trying to connect to oobabooga/text-generation-webui
What happened?
I'm having difficulty correctly configuring LiteLLM. I am getting the following error:
litellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}
I have oobabooga/text-generation-webui
running on 192.168.1.4
with the openai
-compatible API on port 5000
. This works:
$ curl -s http://192.168.1.4:5000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}' | jq .
{
"id": "chatcmpl-1704940905826089216",
"object": "chat.completions",
"created": 1704940905,
"model": "turboderp_Mixtral-8x7B-instruct-exl2",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! It's nice to meet you. Is there something specific you would like to talk about or any questions you have in mind? I'm here to help with any information you might need about artificial intelligence, data science, or related topics."
}
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 52,
"total_tokens": 62
}
}
However, this same call to LiteLLM fails. I have it running on a server located at llm.jrruethe.info
(this is internal-only):
$ curl -s https://llm.jrruethe.info/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}' | jq .
{
"detail": "OpenAIException - Error code: 404 - {'detail': 'Not Found'}\n\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.9/site-packages/litellm/main.py\", line 210, in acompletion\n response = await init_response\n File \"/usr/local/lib/python3.9/site-packages/litellm/llms/openai.py\", line 405, in acompletion\n raise e\n File \"/usr/local/lib/python3.9/site-packages/litellm/llms/openai.py\", line 390, in acompletion\n response = await openai_aclient.chat.completions.create(\n File \"/usr/local/lib/python3.9/site-packages/openai/resources/chat/completions.py\", line 1291, in create\n return await self._post(\n File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1578, in post\n return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1339, in request\n return await self._request(\n File \"/usr/local/lib/python3.9/site-packages/openai/_base_client.py\", line 1429, in _request\n raise self._make_status_error_from_response(err.response) from None\nopenai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.9/site-packages/litellm/proxy/proxy_server.py\", line 1452, in chat_completion\n response = await llm_router.acompletion(**data)\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 320, in acompletion\n raise e\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 316, in acompletion\n response = await self.async_function_with_fallbacks(**kwargs)\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 871, in async_function_with_fallbacks\n raise original_exception\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 799, in async_function_with_fallbacks\n response = await self.async_function_with_retries(*args, **kwargs)\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 930, in async_function_with_retries\n raise original_exception\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 888, in async_function_with_retries\n response = await original_function(*args, **kwargs)\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 382, in _acompletion\n raise e\n File \"/usr/local/lib/python3.9/site-packages/litellm/router.py\", line 361, in _acompletion\n response = await litellm.acompletion(\n File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 2366, in wrapper_async\n raise e\n File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 2258, in wrapper_async\n result = await original_function(*args, **kwargs)\n File \"/usr/local/lib/python3.9/site-packages/litellm/main.py\", line 227, in acompletion\n raise exception_type(\n File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 6628, in exception_type\n raise e\n File \"/usr/local/lib/python3.9/site-packages/litellm/utils.py\", line 5586, in exception_type\n raise NotFoundError(\nlitellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}\n"
}
Below is my LiteLLM config.yaml
:
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
custom_llm_provider: openai
api_base: http://192.168.1.4:5000
api_key: dummy
As far as I am aware, oobabooga/tgi
doesn't actually care about the model names, so I'm just using gpt-3.5-turbo
, even though I currently have the Mixtral model loaded.
I'm running the docker image litellm/litellm:v1.17.0
with the command :
["litellm", "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "1", "--detailed_debug"]
(I've got my reverse proxy handling https and the port mapping)
Here is what it shows on startup:
[2024-01-11 02:03:10 +0000] [1] [INFO] Starting gunicorn 21.2.0
[2024-01-11 02:03:10 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2024-01-11 02:03:10 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-01-11 02:03:10 +0000] [7] [INFO] Booting worker with pid: 7
[2024-01-11 02:03:10 +0000] [7] [INFO] Started server process [7]
[2024-01-11 02:03:10 +0000] [7] [INFO] Waiting for application startup.
LiteLLM Proxy - DEBUG: Loaded config YAML (api_key and environment_variables are not shown):
{
"model_list": [
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "openai/gpt-3.5-turbo",
"custom_llm_provider": "openai",
"api_base": "http://192.168.1.4:5000",
"api_key": "dummy"
}
}
]
}
LiteLLM Router - DEBUG: Initializing OpenAI Client for openai/gpt-3.5-turbo, Api Base:http://192.168.1.4:5000, Api Key:dummy
LiteLLM Router - DEBUG:
Initialized Model List [{'model_name': 'gpt-3.5-turbo', 'litellm_params': {'model': 'openai/gpt-3.5-turbo', 'custom_llm_provider': 'openai', 'api_base': 'http://192.168.1.4:5000', 'api_key': 'dummy'}, 'model_info': {'id': 'b727b6c3-9418-4614-b05e-2f014c3c9f38'}}]
LiteLLM Router - DEBUG: Intialized router with Routing strategy: simple-shuffle
LiteLLM Proxy - DEBUG: prisma client - None
[2024-01-11 02:03:10 +0000] [7] [INFO] Application startup complete.
And here are the outputs of some curls calls to show that the config is set as expected:
curl -X 'GET' \
'https://llm.jrruethe.info/v1/model/info' \
-H 'accept: application/json'
{
"data": [
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "openai/gpt-3.5-turbo",
"custom_llm_provider": "openai",
"api_base": "http://192.168.1.4:5000"
},
"model_info": {}
}
]
}
What am I doing wrong? I've looked through all the documentation I can find, but nothing is standing out.
Relevant log output
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 316, in acompletion
response = await self.async_function_with_fallbacks(**kwargs)
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 871, in async_function_with_fallbacks
raise original_exception
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 799, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 930, in async_function_with_retries
raise original_exception
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 888, in async_function_with_retries
response = await original_function(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 382, in _acompletion
raise e
File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 361, in _acompletion
response = await litellm.acompletion(
File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2366, in wrapper_async
raise e
File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 2258, in wrapper_async
result = await original_function(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/litellm/main.py", line 227, in acompletion
raise exception_type(
File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 6628, in exception_type
raise e
File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 5586, in exception_type
raise NotFoundError(
litellm.exceptions.NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}
LiteLLM Proxy - DEBUG: An error occurred: OpenAIException - Error code: 404 - {'detail': 'Not Found'}
Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
LiteLLM Proxy - DEBUG: Results from router
LiteLLM Proxy - DEBUG:
Router stats
LiteLLM Proxy - DEBUG:
Total Calls made
LiteLLM Proxy - DEBUG: openai/gpt-3.5-turbo: 1
LiteLLM Proxy - DEBUG:
Success Calls made
LiteLLM Proxy - DEBUG:
Fail Calls made
LiteLLM Proxy - DEBUG: openai/gpt-3.5-turbo: 1
Twitter / LinkedIn details
No response
Any idea of what I have configured incorrectly?
hi @jrruethe
Can we hop on a call and help get you setup with litellm proxy Sharing my calendly for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
@jrruethe and @ishaan-jaff
I am facing the same issue. Did you guys find any solutions for this one??
I came back to this after some time away, and for some reason the following works for me now, I'm not exactly sure why this wasn't working before:
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_base: http://192.168.1.4:5000/v1
api_key: dummy
@jrruethe thanks for the update - would you be free for a quick call? I'd love to learn how we can make litellm proxy 10x better for your use case: My calendly is here for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
@shailja-imw I can debug this with you over a call, My calendly is here for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
@ishaan-jaff It is Working from my side, I had taken the latest pull and now it's working fine. Thanks