DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] MiCS vs Zero++ hpZ for Hybrid FSDP

Open jeromeku opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe. I'm interested in hybrid FSDP where the model is replicated across nodes and sharded within node.

My understanding is that this can be achieved through MiCS and / or ZeRO++ hpZ.

Describe the solution you'd like Better documentation, examples, or tutorials on how these solutions differ and how to best compose these features with Zero3 for a given network topology.

jeromeku avatar Aug 31 '24 11:08 jeromeku

@jeromeku, you can start here: https://www.deepspeed.ai/tutorials/zeropp/

tjruwase avatar Sep 04 '24 16:09 tjruwase

@tjruwase

Is it possible partition parameters using the secondary partition for both forward and backwards? That is, only shard intra-node for both forwards and backwards instead of only for backwards?

Can this be accomplished given hpZ, and if so, what would be the appropriate config?

Thanks!

jeromeku avatar Sep 09 '24 07:09 jeromeku

Can this be accomplished given hpZ, and if so, what would be the appropriate config?

No, this is not possible in hpZ.

tjruwase avatar Sep 09 '24 12:09 tjruwase

@tjruwase Are there any benchmarks comparing ZeRO++ hpZ with MiCS? Are there specific use cases for one over the other given the different partitioning schemes employed by hpZ vs MiCS?

jeromeku avatar Sep 18 '24 15:09 jeromeku

@jeromeku, please see the attached performance comparison of hpZ versus MiCS. Generally, hpZ is more memory efficient because, unlike MiCS, it does not replicate the entire model state. However, MiCS might be competitive in scenarios where memory is not a bottleneck. Screenshot 2024-09-18 at 2 10 32 PM

samadejacobs avatar Sep 18 '24 21:09 samadejacobs