litellm [Bug]: Unable to use vllm hosted model

What happened?

Hi I followed the instructions here: https://docs.litellm.ai/docs/providers/vllm My relevant config is: ` - model_name: Mistral-7B-Instruct-v0.2

litellm_params:

  model: vllm/mistralai/Mistral-7B-Instruct-v0.2

  api_base: http://Mistral-7B-Instruct-v0.2.mycloud.local:8000

  api_key: fake-key`

Queries fail. "No module named 'vllm'"

Relevant log output

litellm.acompletion(model=vllm/mistralai/Mistral-7B-Instruct-v0.2) Exception VLLMException - No module named 'vllm'

  File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 28, in validate_environment
    from vllm import LLM, SamplingParams  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'vllm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 51, in completion
    llm, SamplingParams = validate_environment(model=model)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/llms/vllm.py", line 34, in validate_environment
    raise VLLMError(status_code=0, message=str(e))
litellm.llms.vllm.VLLMError: No module named 'vllm'

Twitter / LinkedIn details

No response

Apr 04 '24 09:04 yaronr

Follow up, not sure if it's a separate bug: I changed the config yaml the model: from vllm/mistralai/Mistral-7B-Instruct-v0.2 to openai/mistralai/Mistral-7B-Instruct-v0.2. BTW the documentation is not clear on this. Now I don't get the same exception, but there's another problem: litellm is calling vllm with the wrong (or un-supprted?) url, and vllm returns an error: INFO: 10.42.20.72:45464 - "POST /chat/completions HTTP/1.1" 404 Not Found. As you can see in the vllm code, 'chat/completions' is not supported: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py

Apr 04 '24 10:04 yaronr

vllm is OpenAI compatible can you try this:

litellm_params:
  model: openai/mistralai/Mistral-7B-Instruct-v0.2
  api_base: http://Mistral-7B-Instruct-v0.2.mycloud.local:8000
  api_key: fake-key`

Apr 04 '24 15:04 ishaan-jaff

litellm calls vllm (Mistral) with /chat/completions, which is not implemented:

INFO: 10.42.20.72:43148 - "POST /chat/completions HTTP/1.1" 404 Not Found

vLLM code shows they never implemented /chat/completions, just /v1/chat/completions and /v1/completions (in other words - I'm not sure vllm supports the 'post v1' openAI api) see: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py

Apr 04 '24 15:04 yaronr

Following up: You suggested in a chat to add '/v1' to the base URL, and use 'openai' as a provider (even through I'm running vllm). This worked, except that prompts are not translated: the following query resulted in an incorrect template for a vllm/mistral (which should have been converted):

-d '{'model': 'mistralai/Mistral-7B-Instruct-v0.2', 'messages': [{'role': 'system', 'content': 'Given the following conversation, relevant context, and a follow up question, reply with an answer to the current question the user is asking. Return only your response to the question given the above information following the users instructions as needed.'}, {'role': 'user', 'content': 'hello'}], 'temperature': 0.7, 'stream': True, 'user': 'default_user_id', 'extra_body': {}}' has the following error (both on vllm and litellm logs): Exception OpenAIException - Error code: 400 - {'object': 'error', 'message': 'Conversation roles must alternate user/assistant/user/assistant/...', 'type': 'BadRequestError', 'param': None, 'code': 400}

Apr 07 '24 11:04 yaronr

Experiencing the same thing. litellm is not calling the v1 endpoint of vllm, so I changed my api base and appended the /v1/ to it.

Apr 19 '24 17:04 psykhi

@psykhi Do you use the '/openai' prefix, or '/vllm' ? For me, using /vllm didn't work, so I had to switch to /openai, but then it doesn't translate prompts correctly (for Gemma for example, I'm getting 'System prompt not supported'). Curious if you solved it.

Apr 24 '24 07:04 yaronr

@yaronr yeah i used /openai as well. I'm not sure vllm would accept messages with a system message for models that don't support it. I thought it would only translate them into [INST] tokens etc... So not sure it's really a bug there, but I have limited experience.

Apr 24 '24 10:04 psykhi

I have another issue

APIConnectionError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=facebook/opt-125m
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers

config.yaml

model_list:
  - model_name: facebook/opt-125m
    litellm_params:
      model: openai/facebook/opt-125m # The `text-completion-openai/` prefix will call openai.completions.create
      api_base: http://localhost:8000/

Apr 27 '24 05:04 Jeffwan

@yaronr we now check for the /v1 for openai-compatible endpoints, if a base is given, and append it, if it's missing

https://github.com/BerriAI/litellm/commit/e05764bdb7dda49127dd4b1c2c4d02fa90463e71

Closing as this issue seems fixed. Please bump me, if not

Apr 27 '24 14:04 krrishdholakia

@Jeffwan can you share the curl request you're making to the proxy? and any relevant logs?

Would help to track this in a separate issue

Apr 27 '24 14:04 krrishdholakia

@krrishdholakia Thank you, Can you please let me know how to handle prompt translation for models like Gemma? ( 'System prompt not supported')

Apr 27 '24 14:04 yaronr

since vllm has a /v1/chat/completions endpoint, i thought they would naturally support/handle this. but it looks like i'm wrong here.

for hf models, we check if they support a system message, and if not we pass it in as a normal part of the prompt string (if it's a normal model) or a user message (if it's a chat/instruct model with a known prompt template) - https://github.com/BerriAI/litellm/blob/4cccd470ab72cbc88d71204036229b39ccb8f44b/litellm/llms/prompt_templates/factory.py#L311

Maybe we can expose a flag like supports_system_message: bool, and if it's set to false, we can pass it in as a user message. Thoughts? @yaronr

tracking the issue here - https://github.com/BerriAI/litellm/issues/3325

Apr 27 '24 14:04 krrishdholakia

@Jeffwan can you share the curl request you're making to the proxy? and any relevant logs?

Would help to track this in a separate issue

I figure this out. Since I used openai compatible server in vLLM, this is exactly same as OpenAI provider. vLLM provider is its own protocol

Apr 28 '24 05:04 Jeffwan