sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Fix incompatibility between symm memory and allreduce fusion

Open nvcastet opened this issue 1 month ago • 2 comments

Motivation

Memory registration for the fused allreduce+rmsnorm is incompatible with NCCL symmetric memory registration. Making sure both are not used at the same time.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

nvcastet avatar Nov 20 '25 22:11 nvcastet

Summary of Changes

Hello @nvcastet, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical incompatibility between FlashInfer AllReduce Fusion and NCCL symmetric memory registration. The changes ensure that these two features are never enabled concurrently, thereby preventing potential system instability or incorrect behavior. This is achieved by refining the conditions under which AllReduce Fusion is activated and by adding a robust runtime check to enforce their mutual exclusivity.

Highlights

  • Prevented Automatic AllReduce Fusion with Symmetric Memory: Modified the conditions for enabling FlashInfer AllReduce Fusion for DeepseekV3ForCausalLM and GptOssForCausalLM models to explicitly check that symmetric memory (enable_symm_mem) is not active, thus preventing automatic activation of the fusion when an incompatibility exists.
  • Added Runtime Incompatibility Check: Introduced a new assertion in the server argument validation (check_server_args) that will raise an error if both FlashInfer AllReduce Fusion and symmetric memory are enabled simultaneously, ensuring that these incompatible features cannot be used together.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 20 '25 22:11 gemini-code-assist[bot]

@nvcastet I tried tp4+allreduce fusion+symm memory on Dpsk-fp4, but it was compatible Is there any condition of triggering this incompatibility

Fridge003 avatar Nov 22 '25 02:11 Fridge003

@Fridge003 I think you are correct it should be compatible since trt_allreduce_fusion has its own workspace allocation (unlike the custom-allreduce kernel that registers existing tensors). It means @gracehonv bug on dsr1 fp8 TP needs to be debug further (@gracehonv do you mind creating an issue?). Probably a race condition bug or memory corruption somewhere since the bug gets displaced with code changes. CC @kaixih

@Fridge003 A side note, it would be nice to be able to disable the allreduce_fusion, it gets enabled by default but there is not a way to opt-out unfortunately.

nvcastet avatar Nov 24 '25 16:11 nvcastet

See https://github.com/sgl-project/sglang/issues/13863

nvcastet avatar Nov 24 '25 20:11 nvcastet

@nvcastet Sure, can you open a PR that changes the server args to trtllm allreduce fusion?

Fridge003 avatar Nov 24 '25 22:11 Fridge003