sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Fix two issues related to `--moe-dense-tp-size=1`

Open ch-wan opened this issue 8 months ago • 0 comments

Motivation

There were two issues related to --moe-dense-tp-size=1 with DP attention:

  • The CUDA graph runner overestimated the buffer size required by DP attention, which may prevent the execution of cuda graph (#5527)
  • dp<tp is not executable when --moe-dense-tp-size=1 (#5656).

This PR fixes the two issues. Note that fixing the cuda graph issue requries #5558.

Checklist

  • [ ] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

ch-wan avatar Apr 23 '25 04:04 ch-wan