Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce

Open QPHutu opened this issue 6 months ago • 1 comments

Describe the bug

In megatron/core/pipeline_parallel/schedules.py, finish_embedding_wgrad_compute should appear before enable_grad_sync and grad_sync_func? image

Expected behavior Gradient all-reduce should happen after gradient computations.

QPHutu avatar Aug 16 '24 06:08 QPHutu