DeepSpeed Fix memory leak in zero2 contiguous gradients

Fix memory leak in zero2 contiguous gradients

Open BacharL opened this issue 2 years ago • 2 comments

No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application.

Apr 19 '23 08:04 BacharL

@tjruwase not sure why amd-mi200 test failed

Apr 23 '23 07:04 BacharL

@tjruwase not sure why amd-mi200 test failed

@hablb - the MI200 test isn't fully functional yet, that's why its listed as optional. You can complete this PR without it passing.

Apr 24 '23 13:04 loadams