Ke Bao comments

Results 60 comments of


                                            Ke Bao

Add prefix cache stats to usage

Hi @lvhan028，sorry for the delay. This PR is ready to review.

[WIP] Support qwen2 vl model

The CI ut issue is caused by the old version of transformers. We need to upgrade transformers to 4.45.2.

Support XiaomiMiMo inference with mtp

Could you add the accuracy test result and add ci test?

Support XiaomiMiMo inference with mtp

For memory issue, you can ref https://github.com/sgl-project/sglang/blob/fba8eccd7ebe41bbdbf70ab3b6a2df1835f8b532/python/sglang/srt/model_executor/model_runner.py#L725 to make similar changes.

Support XiaomiMiMo inference with mtp

Could you share the result of `python3 -m sglang.test.send_one`?

Support MLA in Torch Native Attention Backend

Could you fix the pr test and provide some benchmark data vs previous version？

Support MLA in Torch Native Attention Backend

Hi @YangQun1, I reviewed this PR but not sure why this change is related to MLA?

Support MLA in Torch Native Attention Backend

> With this PR, we can run DeepSeek-V2-Lite model with torch native backend while not setting --disable-mla flag. Got it. This change is mainly for the forward_normal part, the kv...

Support MLA in Torch Native Attention Backend

cc: @zhyncs pls help merge

[Bug] CUDA Graph Capture Fail on H200

I think maybe we need to take another way to check if the status is cuda graph capturing. @Fridge003