DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Improve overflow handling in ZeRO

Open tjruwase opened this issue 11 months ago • 3 comments

Fix #5241: Improve overflow handling

  • [x] ZeRO 1
  • [x] ZeRO 2
  • [ ] ZeRO 3
  • [ ] BF16Optimizer

Enable pydantic configuration for mixed precision

  • [x] bf16
  • [x] fp16

tjruwase avatar Jan 28 '25 21:01 tjruwase

@delock, @inkcherry, can you please help investigate the failing xpu-max1100 CI? Thanks!

tjruwase avatar Jan 30 '25 11:01 tjruwase

@delock, @inkcherry, can you please help investigate the failing xpu-max1100 CI? Thanks!

@tjruwase thanks! Our engineer is looking into it.

delock avatar Feb 05 '25 01:02 delock

Any ETA on this for merge?

sayakpaul avatar May 08 '25 10:05 sayakpaul

Any ETA on this for merge? Since CI looks to now be fine, this should be merged by 06/13/25. Thanks for the patience.

tjruwase avatar Jun 06 '25 13:06 tjruwase