DeepSpeed Resolve hard dependency on MOE for contiguous

Resolve hard dependency on MOE for contiguous_gradients on stage 1

Open abhilash1910 opened this issue 1 year ago • 1 comments

Motivation for This PR: In engine.py there is a dependency on contiguous_gradients on MoE, for Stage1 which would imply that even with "contiguous_geadients" enabled, Stage 1 would still default to "buffered_allreduce" during reduce_ipg_grads. If MoE is disabled (no experts) , then even with contiguous_gradients set, we would have to see allreduce pathway for Stage 1; which is a hard dependency on MoE. So this PR is to retain the condition that irrespective of MoE , contiguous_gradients if set should remain set on deepspeed. In this case, without MoE and inducing contiguous_gradients=True allows stage 1 to show reduce collective in place of allreduce (as it should). With MoE experts set it also goes through the reduce path (does not break functionality). Also this is for issues : #622 #264 #1300

@tjruwase requesting review .

Ps: CLA is signed .

May 15 '23 13:05 abhilash1910

@tjruwase could you please review ? cont_gradients should not be hard binded with Moe in engine , as this would mean enabling Moe layers everytime to see cont_grads. This also affects performance, hence seeking your review on this.

Jun 02 '23 16:06 abhilash1910

@abhilash1910 -- Thanks for the PR! Have you tested performance and found issues? It will be helpful to add some more details if you can.

@RezaYazdaniAminabadi, @jeffra, and @tjruwase - I think we should accept this PR. Do you guys remember we did something like this but never pushed a public PR? Thoughts?

Sep 21 '23 20:09 awan-10

Thanks @awan-10 , I found that decoupling the moe provides reduce collective ; I am checking for perf if there is any impact . I will add the log file for the perf characterisations . But moe should be detachable from contiguous gradients.

Sep 27 '23 03:09 abhilash1910

DeepSpeed DeepSpeed copied to clipboard

Resolve hard dependency on MOE for contiguous_gradients on stage 1

DeepSpeed
DeepSpeed copied to clipboard