Ke Bao
Ke Bao
Maybe you can ref https://github.com/triton-lang/triton/issues/4172 and https://github.com/InternLM/lmdeploy/pull/1621#issuecomment-2179731554
/tag-and-rerun-ci
> Does MoE-EP have any support? I have implemented MoE-EP. @xiaobochen123 We are going to implement it with a DP + EP approach for throughput gains. Currently, DP attention is...
@zhyncs @merrymercy LGTM, could you help review and merge?
@hariag could you share the commands for 8*H200?
Could you try to add `--disable-overlap-schedule` and test it again?
> I attached the server side log, please check it. > [debug.log](https://github.com/user-attachments/files/18856437/debug.log) I checked the log, it seems an issue for `sgl_kernels.fp8_blockwise_scaled_mm` cc: @zhyncs @yizhang2077
@Lzhang-hub Did you try the latest main branch?
@lshmouse @ToughK The dp attention is aimed to improve throughput for large batch size (>128). The latency is higher than TP.
Could you add DP attention in the benchmarks?