DeepSpeed
DeepSpeed copied to clipboard
Unit tests for MiCS
In response to the ask from https://github.com/microsoft/DeepSpeed/pull/2964#issuecomment-1832161865, I added three more unit tests related to MiCS.
There are two knowledge issues:
- Testing on Torch 2.1.0 triggers
_IllegalWorker
in coalesced all gather. I made changes to ignore this condition. and Currently, I don't know the reason. - The MiCS implementation is not working with offloading, so the failure in
TestZeroPartialOffloadConfigSweep
is expected.
@zarzen, thanks!
@mrwyattii thanks for suggestions, I updated the implementation accordingly. Besides, as the mics implementation currently is not compatible with offloading, I removed the unittest for test_zero_offloadpp.py