DeepSpeed
DeepSpeed copied to clipboard
Fix memory leak in zero2 contiguous gradients
No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application.
@tjruwase not sure why amd-mi200 test failed
@tjruwase not sure why amd-mi200 test failed
@hablb - the MI200 test isn't fully functional yet, that's why its listed as optional. You can complete this PR without it passing.