ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[DOC]: Questions about the old version of ZeRO

Open chiakicage opened this issue 2 years ago • 1 comments

📚 The doc issue

There were two ZeRO docs in the old version (maybe ZeRO and ZeRO with chunk-based memory management), but now the former one was deleted. The two versions of ZeRO have different methods to use, as #2975 mentioned.

Now I have some questions about the old version.

Firstly, I wonder if I can use ZeRO-1. I think when shard_param is True in ZeroInitContext, it will be ZeRO-3. Otherwise, it will be ZeRO-2 as this comment says. So what about ZeRO-1?

And I also want to confirm that if I set tensor_placement_policy to cuda in ShardedModelV2, nothing will be offloaded to CPU.

I noticed that the code of the old version (shard_model_v2 and shard_optim_v2) has not been updated for a long time, and there are new implementations for ZeRO-3 and ZeRO-1/2. I wonder if the old version of ZeRO will be deprecated in the future?

I just tried the new version of ZeRO, but it has a significant speed drop compared to the old version (almost 1.5x time per iteration) training the same model on the same machine. By adjusting the configuration, will the new version have better performance?

chiakicage avatar Mar 05 '23 14:03 chiakicage

Hi, old version of ZeRO will be be deprecated in the future. According to our benchmark results, new version of ZeRO is better than the old one. Could you tell us your benchmark configuration?

ver217 avatar Mar 06 '23 07:03 ver217