Megatron-LM
Megatron-LM copied to clipboard
[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce
Describe the bug
In megatron/core/pipeline_parallel/schedules.py
,
finish_embedding_wgrad_compute
should appear before enable_grad_sync
and grad_sync_func
?
Expected behavior Gradient all-reduce should happen after gradient computations.