[Bug]: Unable to use vllm hosted model
What happened?
Hi I followed the instructions here: https://docs.litellm.ai/docs/providers/vllm My relevant config is: ` - model_name: Mistral-7B-Instruct-v0.2
litellm_params:
model: vllm/mistralai/Mistral-7B-Instruct-v0.2
api_base: http://Mistral-7B-Instruct-v0.2.mycloud.local:8000
api_key: fake-key`
Queries fail. "No module named 'vllm'"
Relevant log output
litellm.acompletion(model=vllm/mistralai/Mistral-7B-Instruct-v0.2) Exception VLLMException - No module named 'vllm'
File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 28, in validate_environment
from vllm import LLM, SamplingParams # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'vllm'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 51, in completion
llm, SamplingParams = validate_environment(model=model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 34, in validate_environment
raise VLLMError(status_code=0, message=str(e))
litellm.llms.vllm.VLLMError: No module named 'vllm'
Twitter / LinkedIn details
No response
Follow up, not sure if it's a separate bug: I changed the config yaml the model: from vllm/mistralai/Mistral-7B-Instruct-v0.2 to openai/mistralai/Mistral-7B-Instruct-v0.2. BTW the documentation is not clear on this. Now I don't get the same exception, but there's another problem: litellm is calling vllm with the wrong (or un-supprted?) url, and vllm returns an error: INFO: 10.42.20.72:45464 - "POST /chat/completions HTTP/1.1" 404 Not Found. As you can see in the vllm code, 'chat/completions' is not supported: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
vllm is OpenAI compatible can you try this:
litellm_params:
model: openai/mistralai/Mistral-7B-Instruct-v0.2
api_base: http://Mistral-7B-Instruct-v0.2.mycloud.local:8000
api_key: fake-key`
litellm calls vllm (Mistral) with /chat/completions, which is not implemented:
INFO: 10.42.20.72:43148 - "POST /chat/completions HTTP/1.1" 404 Not Found
vLLM code shows they never implemented /chat/completions, just /v1/chat/completions and /v1/completions (in other words - I'm not sure vllm supports the 'post v1' openAI api) see: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
Following up: You suggested in a chat to add '/v1' to the base URL, and use 'openai' as a provider (even through I'm running vllm). This worked, except that prompts are not translated: the following query resulted in an incorrect template for a vllm/mistral (which should have been converted):
-d '{'model': 'mistralai/Mistral-7B-Instruct-v0.2', 'messages': [{'role': 'system', 'content': 'Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed.'}, {'role': 'user', 'content': 'hello'}], 'temperature': 0.7, 'stream': True, 'user': 'default_user_id', 'extra_body': {}}'
has the following error (both on vllm and litellm logs):
Exception OpenAIException - Error code: 400 - {'object': 'error', 'message': 'Conversation roles must alternate user/assistant/user/assistant/...', 'type': 'BadRequestError', 'param': None, 'code': 400}
Experiencing the same thing. litellm is not calling the v1 endpoint of vllm, so I changed my api base and appended the /v1/ to it.
@psykhi Do you use the '/openai' prefix, or '/vllm' ? For me, using /vllm didn't work, so I had to switch to /openai, but then it doesn't translate prompts correctly (for Gemma for example, I'm getting 'System prompt not supported'). Curious if you solved it.
@yaronr yeah i used /openai as well. I'm not sure vllm would accept messages with a system message for models that don't support it. I thought it would only translate them into [INST] tokens etc... So not sure it's really a bug there, but I have limited experience.
I have another issue
APIConnectionError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=facebook/opt-125m
Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
config.yaml
model_list:
- model_name: facebook/opt-125m
litellm_params:
model: openai/facebook/opt-125m # The `text-completion-openai/` prefix will call openai.completions.create
api_base: http://localhost:8000/
@yaronr we now check for the /v1 for openai-compatible endpoints, if a base is given, and append it, if it's missing
https://github.com/BerriAI/litellm/commit/e05764bdb7dda49127dd4b1c2c4d02fa90463e71
Closing as this issue seems fixed. Please bump me, if not
@Jeffwan can you share the curl request you're making to the proxy? and any relevant logs?
Would help to track this in a separate issue
@krrishdholakia Thank you, Can you please let me know how to handle prompt translation for models like Gemma? ( 'System prompt not supported')
since vllm has a /v1/chat/completions endpoint, i thought they would naturally support/handle this. but it looks like i'm wrong here.
for hf models, we check if they support a system message, and if not we pass it in as a normal part of the prompt string (if it's a normal model) or a user message (if it's a chat/instruct model with a known prompt template) - https://github.com/BerriAI/litellm/blob/4cccd470ab72cbc88d71204036229b39ccb8f44b/litellm/llms/prompt_templates/factory.py#L311
Maybe we can expose a flag like supports_system_message: bool, and if it's set to false, we can pass it in as a user message. Thoughts? @yaronr
tracking the issue here - https://github.com/BerriAI/litellm/issues/3325
@Jeffwan can you share the curl request you're making to the proxy? and any relevant logs?
Would help to track this in a separate issue
I figure this out. Since I used openai compatible server in vLLM, this is exactly same as OpenAI provider. vLLM provider is its own protocol