Cheng Wan issues

Repositories
Issues
Comments

Results 4 issues of


                                            Cheng Wan

fix typo in the path

[moe] fix: correct the cache size in the last chunk

## Motivation This code confronts AssertionError: ```python import torch from sglang.srt.layers.moe.fused_moe_triton.fused_moe import fused_moe N = 64 * 1024 + 10 E = 8 H = 1024 I = 4096 x...

[moe] optim: reduce memory consumption in fused_moe

## Motivation This PR can partially address #3633. ## Modifications We reuse the memory of `intermediate_cache1` to create `intermediate_cache3`. Here is the test script ```python import torch from sglang.srt.layers.moe.fused_moe_triton.fused_moe import...

Fix two issues related to `--moe-dense-tp-size=1`

## Motivation There were two issues related to `--moe-dense-tp-size=1` with DP attention: - The CUDA graph runner overestimated the buffer size required by DP attention, which may prevent the execution...