sglang
sglang copied to clipboard
After enabling tensor parallelism (tp-size=2), there is no response
my command is:
CUDA_VISIBLE_DEVICES="2,4" python -m sglang.launch_server --model-path ./Yi-34B-Chat --trust-remote-code --port 30000 --tp-size 2
when I run the demo code, there is nothing returned.
from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint
@function
def multi_turn_question(s, question_1, question_2):
s += system("You are a helpful assistant.")
s += user(question_1)
s += assistant(gen("answer_1", max_tokens=256))
s += user(question_2)
s += assistant(gen("answer_2", max_tokens=256))
set_default_backend(RuntimeEndpoint("http://localhost:30000"))
state = multi_turn_question.run(
question_1="What is the capital of the United States?",
question_2="List two local attractions.",
)
for m in state.messages():
print(m["role"], ":", m["content"])
But when I remove "--tp-size 2 " in the command ,which means the model is only in 1 GPU , it works well.
I'm also running into this.
me too
me too.