Megatron-LM [BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce

[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce

Open QPHutu opened this issue 6 months ago • 1 comments

Describe the bug

In megatron/core/pipeline_parallel/schedules.py, finish_embedding_wgrad_compute should appear before enable_grad_sync and grad_sync_func?

Expected behavior Gradient all-reduce should happen after gradient computations.

Aug 16 '24 06:08 QPHutu