LJL36 comments

Results 5 comments of


                                            LJL36

数据加载阶段程序挂了

> > > > 我的机器是单机8卡，45G显存，一般来说不会内存不够？会不会是docker没给全部的机器的内存？您好，我这边也是375G内存，加载数据跟你报了一样的错误，我用的还是0.5M的数据，您这边解决了吗？

[Bug] DeepSeek R1 serve crash occasionally on 2*H100

> I encountered the same issue on Deepseek-V3 2*8 H20. Is it fixed in sglang==0.4.2.post4? [@zhyncs](https://github.com/zhyncs) i have tried sglang==0.4.2.post4 but useless

[Bug]NCCL error if enable the cuda graph

> Thank you for raising this issue, [@ispobock](https://github.com/ispobock) [@zhyncs](https://github.com/zhyncs) could you help look at this issue, thanks the problem has been solved, refer to https://github.com/sgl-project/sglang/issues/3547#issuecomment-2662517700

Hang at around 4% during the CUDA graph loading process

> Hi, I try to running deepseek-r1 with two H20 server nodes, but hang at **around 4%** during the **CUDA graph loading process**, no any error. How can I fix...

Hang at around 4% during the CUDA graph loading process

> update nccl to nccl 2.24，fixed hangs when running with different CPU architectures. https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-24-3.html#rel_2-24-3 > > thx，it solves my problem!