Hang at around 4% during the CUDA graph loading process
Hi, I try to running deepseek-r1 with two H20 server nodes, but hang at around 4% during the CUDA graph loading process, no any error. How can I fix this issue? If I add the --disable-cuda-graph flag, the problem doesn't occur
thanks for raising this issue, it is similar to this issue #3538
Hi, I try to running deepseek-r1 with two H20 server nodes, but hang at around 4% during the CUDA graph loading process, no any error. How can I fix this issue? If I add the --disable-cuda-graph flag, the problem doesn't occur
![]()
it looks like you are using the official docker?
it looks like you are using the official docker?
yes,im using sglang docker
try to reduce --cuda-graph-max-bs=32
try to reduce
--cuda-graph-max-bs=32
I try to running deepseek-r1 with 48A100, Hang at around 17% during the CUDA graph loading process,
what should i do
try to reduce
--cuda-graph-max-bs=32
hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ...
if i usie --cuda-graph-max-bs=16 , will hang at 20% ...
try to reduce
--cuda-graph-max-bs=32hi, if i use --cuda-graph-max-bs=32 , will hang at 14% ... if i usie --cuda-graph-max-bs=16 , will hang at 20% ...
me too
update nccl to nccl 2.24,fixed hangs when running with different CPU architectures. https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-24-3.html#rel_2-24-3
update nccl to nccl 2.24,fixed hangs when running with different CPU architectures. https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-24-3.html#rel_2-24-3
![]()
thx,it solves my problem!
