ColossalAI
ColossalAI copied to clipboard
[BUG]: ZeRO not Working with SGD Optimizer
🐛 Describe the bug
ZeRO will keep throwing overflow if used together with momentum SGD in the resnet example. The code works fine with all kinds of amp.
Environment
No response
ZeRO is used in the context of ADAM or 2nd order optimizer. Generally, a DNN using SGD does not have memory shortage issues. We can through an error if the user uses SGD for ZeRO.
It is understood that ZeRO is not needed for SGD from the memory perspective, but this overflow might suggest a bug in the current implementation.
I see. We will check it later.
We have updated a lot. This issue was closed due to inactivity. Thanks.