Ke Bao

Results 60 comments of Ke Bao

Maybe you can ref https://github.com/triton-lang/triton/issues/4172 and https://github.com/InternLM/lmdeploy/pull/1621#issuecomment-2179731554

> Does MoE-EP have any support? I have implemented MoE-EP. @xiaobochen123 We are going to implement it with a DP + EP approach for throughput gains. Currently, DP attention is...

@zhyncs @merrymercy LGTM, could you help review and merge?

Could you try to add `--disable-overlap-schedule` and test it again?

> I attached the server side log, please check it. > [debug.log](https://github.com/user-attachments/files/18856437/debug.log) I checked the log, it seems an issue for `sgl_kernels.fp8_blockwise_scaled_mm` cc: @zhyncs @yizhang2077

@lshmouse @ToughK The dp attention is aimed to improve throughput for large batch size (>128). The latency is higher than TP.

Could you add DP attention in the benchmarks?