[BUG] Improper coupling of paramter list between DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3
Describe the bug DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 shares same parameter list, which would cause divergence easily
** Details **
In https://github.com/deepspeedai/DeepSpeed/blob/b7cd78f096016ae67a11ef6292eba28e0452b4e7/deepspeed/runtime/engine.py#L1846 , DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 initializer shares same parameter list. This caused extra maintence if any one of these parameter list needs to change. There are two observations:
- There is already mismatch (i.e.
param_names) and this will break SuperOffload. cpuadam_cores_percadded toDeepSpeedZeroOptimizer_Stage3as parameter but not used.
** Suggestion **
Seperate calls to DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3.
@delock Thank you for the suggestions.
In the SuperOffload PR, I tried to align the style with the Zenflow to maintain consistency. In Zenflow, both DeepSpeedZeroOptimizer and ZenFlowZeroOptimizer share the same parameter list in a similar manner — as seen here: https://github.com/deepspeedai/DeepSpeed/blob/b7cd78f096016ae67a11ef6292eba28e0452b4e7/deepspeed/runtime/engine.py#L1771
That said, I’m totally fine with making separate calls for SuperOffload and Zenflow. I can prepare a PR to implement that.