sglang After enabling tensor parallelism (tp-size=2), there is no response

After enabling tensor parallelism (tp-size=2), there is no response

Open wushixong opened this issue 1 year ago • 3 comments

my command is:

CUDA_VISIBLE_DEVICES="2,4" python -m sglang.launch_server --model-path  ./Yi-34B-Chat --trust-remote-code --port 30000 --tp-size 2

when I run the demo code, there is nothing returned.

from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint

@function
def multi_turn_question(s, question_1, question_2):
    s += system("You are a helpful assistant.")
    s += user(question_1)
    s += assistant(gen("answer_1", max_tokens=256))
    s += user(question_2)
    s += assistant(gen("answer_2", max_tokens=256))

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = multi_turn_question.run(
    question_1="What is the capital of the United States?",
    question_2="List two local attractions.",
)

for m in state.messages():
    print(m["role"], ":", m["content"])

But when I remove "--tp-size 2 " in the command ,which means the model is only in 1 GPU , it works well.

Feb 06 '24 12:02 wushixong

I'm also running into this.

Feb 09 '24 18:02 Reichenbachian

me too

Mar 04 '24 17:03 felifri

me too.

Mar 19 '24 04:03 lss15151161

sglang sglang copied to clipboard

After enabling tensor parallelism (tp-size=2), there is no response

sglang
sglang copied to clipboard