Sungha Choi
Sungha Choi
@M3Dade Hi, have you resolved the "wrong checksum" issue? Best,
Hi @bf-yang Did you solve the problem you mentioned? I'm currently experiencing the same problem. Thank you!
I encountered the same issue. Is there any solution? > ... /deepspeed/runtime/bf16_optimizer.py", line 312, in step > [rank0]: assert all_groups_norm > 0. > [rank0]: AssertionError deepspeed 0.15.0 transformers 4.44.2
> I encountered the same issue. Is there any solution? > > > ... /deepspeed/runtime/bf16_optimizer.py", line 312, in step > > [rank0]: assert all_groups_norm > 0. > > [rank0]: AssertionError...
Hi @vacancy, Thanks a lot for your reply :) I have tested sync batch norm on deeplab-resnet based segmentation task. When I applied sync batch norm, it consumes about 30-40%...