dspy icon indicating copy to clipboard operation
dspy copied to clipboard

how to set tensor_parallel_size for vllm backend

Open Jasonsey opened this issue 7 months ago • 3 comments

What happened?

my code is here,

import dspy


lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)


qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)

my question is how to set tensor_parallel_size for vllm backend? this code isnot working for this param

Steps to reproduce

import dspy


lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)


qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)

DSPy version

2.6.17

Jasonsey avatar Apr 11 '25 06:04 Jasonsey

Hi @Jasonsey , I believe you need to add the hosted_vllm prefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!

arnavsinghvi11 avatar Apr 11 '25 15:04 arnavsinghvi11

Hi @Jasonsey , I believe you need to add the hosted_vllm prefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!

you are right, but I can't find where to fill tensor_parallel_size in this doc. do you have any idea?

Jasonsey avatar Apr 11 '25 15:04 Jasonsey

if this is a LM arg, you can set it in dspy.LM() or you can configure within in the model you launch with vLLM before querying.

arnavsinghvi11 avatar Apr 11 '25 16:04 arnavsinghvi11

@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?

okhat avatar Apr 15 '25 17:04 okhat

@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?

Here is how we can use vllm for tensor_parallel_size param:

from vllm import LLM
llm = LLM("facebook/opt-13b", tensor_parallel_size=4)
output = llm.generate("San Francisco is a")

Jasonsey avatar Apr 16 '25 00:04 Jasonsey

Yes. Please pass this when launching the vLLM server. Not related to DSPy.

okhat avatar Apr 16 '25 03:04 okhat

Yes. Please pass this when launching the vLLM server. Not related to DSPy.

@okhat you know the function call chain is like this: dspy -> litellm -> vllm, and dspy is this frontend code. so how can I insert this param tensor_parallel_size from dspy to vllm?

Jasonsey avatar Apr 17 '25 12:04 Jasonsey