luchangli

Results 18 comments of luchangli

Currently SGLangGPUConnector does not support MLA, and MLA support will results in a lots of if else branch and redundant codes. In existing code, MLA and MHA need call different...

I will update my pr and remove redundant codes.

@Oasis-Git @YaoJiayi I have updated the code to reuse the existing code to support MLA format for sglang.

@YaoJiayi please check the updated codes. using one kv intead of two. still use the multi_layer_kv_transfer_unilateral kernel since using the existing single_layer_kv_transfer kernel need be called twice, which is not...

> @llc-kc Is the code ready? yes, ready and will not be modified

@YaoJiayi @Oasis-Git code format fixed

@wqlxx there is a test function test_sglang_connector_with_gpu_and_mla in https://github.com/LMCache/LMCache/blob/dev/tests/v1/test_gpu_connector.py Moreover, I have successfully run Qwen3 none PD sglang+LMCache offloading based on https://github.com/Oasis-Git/sglang/tree/lmcache.

@wqlxx I think you can run None PD MHA first, then try MLA, and then try PD.

@wqlxx I haven't tested deepseek yet.