DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Fix expert grad scaling problem with ZeRO optimizer

Open wyooyw opened this issue 5 months ago • 3 comments

Fix [#6545]

work:

  • expert gradient average: divide edp_world_size -> divide dp_world_size
  • unit test: make sure model with different dp/ep has same expert gradient

wyooyw avatar Sep 17 '24 15:09 wyooyw