Megatron-LM
Megatron-LM copied to clipboard
[ENHANCEMENT] support zero 2 distributed optimize
Is your feature request related to a problem? Please describe.
As far as I know, the current distributed optimizer of megatron-lm implements zero1, but zero1 does not save enough GPU memory. When I train a model, I use the --use-distributed-optimizer
parameter to only make each GPU memory usage has dropped from 49GB to 47GB. I conducted the same test on megatron-deepspeed. When megatron-deepspeed doubled the MBS
, the GPU memory usage was only 36GB. In megatron-deepspeed, I used zero2. I think The reduction of GPU memory is related to my use of zero2.
Describe the solution you'd like
I hope megatron-lm can also implement zero2
Describe alternatives you've considered
https://github.com/microsoft/Megatron-DeepSpeed
@lmcafee-nvidia Hi, Could you please help me about this?
Marking as stale. No activity in 60 days.
Any progress?
Marking as stale. No activity in 60 days.
@shanmugamr1992 can you please let us know how to enable ZeRO 1/2/3 ? Thanks
@shanmugamr1992 please let know how can i enable ZeRO 1/2/3 feature ? raised #1156
We do not currently support Zero 2/3. But it is possible that we will support this in the future.