Wei Du

Results 2 issues of Wei Du

**Describe the bug** Currently, the checkpoint fails to save for Qwen3-30B-A3B when using a higher number of GPUs under certain configurations. For example: 64 nodes (512 GPUs): TP=4, CP=8, PP=1,...

bug
research

Hi authors I’m currently a research scientist at NVIDIA working on mathematical reasoning. I came across your repository, and I really appreciate the work you’ve done! We’re also working on...