DeepSpeedExamples Enable overlap_comm for better performance

Enable overlap_comm for better performance

Open li-plus opened this issue 1 year ago • 0 comments

Enable overlap of backward computation and gradient all-reduce. This produces 1.05x end-to-end speedup in SFT training with my settings. See also https://github.com/microsoft/DeepSpeed/pull/4887.

Jan 08 '24 09:01 li-plus

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Enable overlap_comm for better performance

DeepSpeedExamples
DeepSpeedExamples copied to clipboard