sglang Hang at around 4% during the CUDA graph loading process

Hi, I try to running deepseek-r1 with two H20 server nodes, but hang at around 4% during the CUDA graph loading process, no any error. How can I fix this issue? If I add the --disable-cuda-graph flag, the problem doesn't occur

Feb 13 '25 10:02 BoBo0037

thanks for raising this issue, it is similar to this issue #3538

Feb 13 '25 15:02 minleminzui

Hi, I try to running deepseek-r1 with two H20 server nodes, but hang at around 4% during the CUDA graph loading process, no any error. How can I fix this issue? If I add the --disable-cuda-graph flag, the problem doesn't occur

it looks like you are using the official docker?

Feb 14 '25 03:02 LJL36

it looks like you are using the official docker?

yes，im using sglang docker

Feb 14 '25 03:02 BoBo0037

try to reduce --cuda-graph-max-bs=32

Feb 14 '25 03:02 zhyncs

try to reduce --cuda-graph-max-bs=32

I try to running deepseek-r1 with 48A100, Hang at around 17% during the CUDA graph loading process,

what should i do

Feb 14 '25 03:02 zhaotyer

try to reduce --cuda-graph-max-bs=32

hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ... if i usie --cuda-graph-max-bs=16 , will hang at 20% ...

Feb 14 '25 04:02 BoBo0037

try to reduce --cuda-graph-max-bs=32

hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ... if i usie --cuda-graph-max-bs=16 , will hang at 20% ...

me too

Feb 14 '25 04:02 zhaotyer

update nccl to nccl 2.24，fixed hangs when running with different CPU architectures. https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-24-3.html#rel_2-24-3

Feb 17 '25 09:02 desertchen

update nccl to nccl 2.24，fixed hangs when running with different CPU architectures. https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-24-3.html#rel_2-24-3

thx，it solves my problem!

Feb 19 '25 14:02 LJL36