DeepSpeed
DeepSpeed copied to clipboard
[DRAFT] Tentative implementation of MiCS
This PR drafts a tentative implementation of MiCS (https://arxiv.org/abs/2205.00119), which is first discussed in #2801
For trying out the implementation, you can test it with a toy model at https://github.com/microsoft/DeepSpeed/blob/fdb8706a5f0b7564fe92d20cbeea460c2b569983/tests/small_model_debugging/test_mics_config.py
And play with the shard configuration at here: https://github.com/microsoft/DeepSpeed/blob/fdb8706a5f0b7564fe92d20cbeea460c2b569983/tests/small_model_debugging/test_mics_config.py#L84
Suggestions, comments are welcome, thanks!