Ke Bao

Results 60 comments of Ke Bao

Hi @lvhan028,sorry for the delay. This PR is ready to review.

The CI ut issue is caused by the old version of transformers. We need to upgrade transformers to 4.45.2.

Could you add the accuracy test result and add ci test?

For memory issue, you can ref https://github.com/sgl-project/sglang/blob/fba8eccd7ebe41bbdbf70ab3b6a2df1835f8b532/python/sglang/srt/model_executor/model_runner.py#L725 to make similar changes.

Could you share the result of `python3 -m sglang.test.send_one`?

Could you fix the pr test and provide some benchmark data vs previous version?

Hi @YangQun1, I reviewed this PR but not sure why this change is related to MLA?

> With this PR, we can run DeepSeek-V2-Lite model with torch native backend while not setting --disable-mla flag. Got it. This change is mainly for the forward_normal part, the kv...

I think maybe we need to take another way to check if the status is cuda graph capturing. @Fridge003