Baizhou Zhang
Baizhou Zhang
cc @fzyzcjy @kaixih @fy1214 Please have a look
/tag-and-rerun-ci
Thanks, this seems to be a good idea!
> > Hello [@FrankLeeeee](https://github.com/FrankLeeeee) , would you please take a look the PR [#3680](https://github.com/sgl-project/sglang/pull/3680)? Appreciate that. Additionally, we have one concern: As we previously ran DeepSeek-R1 on SGLang and confirmed...
@teadross can you pull the latest main branch and try again? This bug seems to be solved according to #3836
@nvcastet I tried tp4+allreduce fusion+symm memory on Dpsk-fp4, but it was compatible Is there any condition of triggering this incompatibility
@nvcastet Sure, can you open a PR that changes the server args to trtllm allreduce fusion?
Thanks @YAMY1234~ If your PR is blocked on FlashMLA side, you can create a new branch at https://github.com/sgl-project/FlashMLA. The flashmla kernel now integrated in sglang are built on this repo
@YAMY1234 Can you add a benchmark for bs=1? Expectedly pure TP should be faster than DP+TP
> > @YAMY1234 Can you add a benchmark for bs=1? Expectedly pure TP should be faster than DP+TP > > @Fridge003 Added in the PR description~ Oh I mean performance...