Qi Penghui
Results
2
issues of
Qi Penghui
**Describe the bug** In `megatron/core/pipeline_parallel/schedules.py`, `finish_embedding_wgrad_compute` should appear before `enable_grad_sync` and `grad_sync_func`? **Expected behavior** Gradient all-reduce should happen after gradient computations.