hemengfei
Results
1
issues of
hemengfei
[BUG] The same program runs fine with v0.17.5, but fails with v0.17.6. Under the zero2 configuration
1
[rank2]: self.train_model.backward(loss) [rank2]: ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^ [rank2]: File "/mnt/data/anaconda3/envs/optimizer/lib/python3.13/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn [rank2]: ret_val = func(*args, **kwargs) [rank2]: File "/mnt/data/anaconda3/envs/optimizer/lib/python3.13/site-packages/deepspeed/runtime/engine.py", line 2324, in backward [rank2]: self._backward_epilogue() [rank2]: ~~~~~~~~~~~~~~~~~~~~~~~^^ [rank2]: File "/mnt/data/anaconda3/envs/optimizer/lib/python3.13/site-packages/deepspeed/runtime/engine.py",...
bug
training