DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Unit tests for MiCS

Open zarzen opened this issue 1 year ago • 2 comments

In response to the ask from https://github.com/microsoft/DeepSpeed/pull/2964#issuecomment-1832161865, I added three more unit tests related to MiCS.

There are two knowledge issues:

  • Testing on Torch 2.1.0 triggers _IllegalWorker in coalesced all gather. I made changes to ignore this condition. and Currently, I don't know the reason.
  • The MiCS implementation is not working with offloading, so the failure in TestZeroPartialOffloadConfigSweep is expected.

zarzen avatar Dec 09 '23 06:12 zarzen

@zarzen, thanks!

tjruwase avatar Dec 14 '23 23:12 tjruwase

@mrwyattii thanks for suggestions, I updated the implementation accordingly. Besides, as the mics implementation currently is not compatible with offloading, I removed the unittest for test_zero_offloadpp.py

zarzen avatar Dec 27 '23 22:12 zarzen