dspy
dspy copied to clipboard
how to set tensor_parallel_size for vllm backend
What happened?
my code is here,
import dspy
lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)
qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)
my question is how to set tensor_parallel_size for vllm backend? this code isnot working for this param
Steps to reproduce
import dspy
lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)
qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)
DSPy version
2.6.17
Hi @Jasonsey , I believe you need to add the hosted_vllm prefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!
Hi @Jasonsey , I believe you need to add the
hosted_vllmprefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!
you are right, but I can't find where to fill tensor_parallel_size in this doc. do you have any idea?
if this is a LM arg, you can set it in dspy.LM() or you can configure within in the model you launch with vLLM before querying.
@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?
@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?
Here is how we can use vllm for tensor_parallel_size param:
from vllm import LLM
llm = LLM("facebook/opt-13b", tensor_parallel_size=4)
output = llm.generate("San Francisco is a")
Yes. Please pass this when launching the vLLM server. Not related to DSPy.
Yes. Please pass this when launching the vLLM server. Not related to DSPy.
@okhat you know the function call chain is like this: dspy -> litellm -> vllm, and dspy is this frontend code. so how can I insert this param tensor_parallel_size from dspy to vllm?