YAMY

Results 6 comments of YAMY

> > In _forward_flashmla_sparse(...), pad q’s head dimension to the required multiple (64 on SM90, 128 on SM100+) > > @YAMY1234 For the Hxx device, the padding head may exhibit...

> @YAMY1234 Hi, I use your branch (https://github.com/YAMY1234/sglang/tree/dpsk_tp) and get some error in PD: Thanks for pointing this out! For now this PR is mainly focused on and validated under...

> @YAMY1234 Can you add a benchmark for bs=1? Expectedly pure TP should be faster than DP+TP @Fridge003 Added in the PR description~

> > > @YAMY1234 Can you add a benchmark for bs=1? Expectedly pure TP should be faster than DP+TP > > > > > > @Fridge003 Added in the PR...

@Fridge003 Thanks! Added docs and unittest~

> Please add accuracy tests results on B200 (gpqa, gsm8k 20shots) Updated in the pr description~