Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Remove redundant reduce in aux_loss logging

Open BestJuly opened this issue 1 month ago • 1 comments

What does this PR do ?

In previous implementation, reduce across DP is added for the correct aux_loss logging. However, current reduction will cause redundant communication

  • For seq_load_balancing_loss and load_balancing_loss, self.tp_cp_group is used (code, code)
  • For global_load_balancing_loss, self.tp_dp_cp_group is used (code).

Therefore, related reduction has already been conducted across corresponding parallel group. And we can remove the duplicated one for better perf.

:warning: For major changes (either in lines of code or in its impact), please make sure to first share discuss a design-doc with the team.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

  • [ ] I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • [ ] I have added relevant unit tests
  • [ ] I have added relevant functional tests
  • [ ] I have added proper typing to my code Typing guidelines
  • [ ] I have added relevant documentation
  • [ ] I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

:warning: Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either [email protected] or [email protected].

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

BestJuly avatar Nov 03 '25 08:11 BestJuly

Please correctly label this PR

yanring avatar Nov 12 '25 02:11 yanring

/ok to test fe8e9e26f1e939f96a8240cda7d6841d582458df

yanring avatar Dec 01 '25 23:12 yanring