Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[dev] DeepSeek V3.2 support

Open kunlunl opened this issue 1 month ago • 12 comments

What does this PR do ?

Support DeepSeek V3.2 style sparse attention. Still work in progress.

:warning: For major changes (either in lines of code or in its impact), please make sure to first share discuss a design-doc with the team.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

  • [ ] I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • [ ] I have added relevant unit tests
  • [ ] I have added relevant functional tests
  • [ ] I have added proper typing to my code Typing guidelines
  • [ ] I have added relevant documentation
  • [ ] I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

:warning: Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either [email protected] or [email protected].

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

kunlunl avatar Nov 06 '25 02:11 kunlunl

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Nov 06 '25 02:11 copy-pr-bot[bot]

Great work—this has been really helpful to me. Quick question: what are the typical values used for indexer-loss-coeff?

iansheng avatar Nov 10 '25 12:11 iansheng

Great work—this has been really helpful to me. Quick question: what are the typical values used for indexer-loss-coeff?

@iansheng Thanks for your interest. Currently, I don't have relevant experience about indexer loss coefficient, still working on refining this MR. and haven't actually used it to train a model yet.

kunlunl avatar Nov 11 '25 11:11 kunlunl

Thanks for your quick and great work. I've preliminarily view the code and it overall LGTM. Please add UTs for the new features, especially for the functionality of DSA and its parallel correctness. I'll do a second-round review after everything is ready.

yuzhongw-nvidia avatar Nov 12 '25 09:11 yuzhongw-nvidia

Please also do not forget to create a mirror PR to the main branch once ready for review. Thanks!

yuzhongw-nvidia avatar Nov 19 '25 11:11 yuzhongw-nvidia

/ok to test ad04d58

kunlunl avatar Nov 27 '25 08:11 kunlunl

/ok to test 9dae1ce

kunlunl avatar Nov 27 '25 08:11 kunlunl

/ok to test 9cc697b

kunlunl avatar Nov 27 '25 10:11 kunlunl

/ok to test 0d7e1d1

kunlunl avatar Nov 27 '25 11:11 kunlunl

/ok to test 6fe2f35

kunlunl avatar Nov 28 '25 08:11 kunlunl

/ok to test 33684d2

kunlunl avatar Nov 28 '25 10:11 kunlunl

/ok to test a0b6fd9

kunlunl avatar Nov 28 '25 11:11 kunlunl

/ok to test 1d2f01c

kunlunl avatar Dec 01 '25 03:12 kunlunl

/ok to test 23ba310

kunlunl avatar Dec 01 '25 03:12 kunlunl

/ok to test cbfa053

kunlunl avatar Dec 01 '25 04:12 kunlunl

Please adapt the MTP function of DSA as soon as possible

ninangezaici avatar Dec 11 '25 07:12 ninangezaici