DeepSpeed
DeepSpeed copied to clipboard
fix load_optimizer_states for MoE (#2737)
When load_optimizer_states=False is used for MoE load_checkpoint - do not attempt to load the optimizer state files.
This currently fails as DeepSpeed still attempts to load those, even though they are not used afterwards.
Adding parameterized unit tests for various cases.
Verified via: pytest tests/unit/checkpoint/test_moe_checkpoint.py -k 'test_checkpoint_moe_and_zero'
= 6 passed, 1 deselected, 102 warnings in 156.67s (0:02:36) =
Hi @tjruwase, I almost got this to work, but for some reason when I suppress loading optimizer states for Stage3 tensor correctness fails for model parameters in unit test. Do you have an idea why? Is there's something besides optimizer states in Stage3 ZeRO optimizer states dictionary or is there some side-effect of loading Stage3 optimizer for model parameters? It's surprising it works for stages 0, 1, 2 just fine.