luchangli comments

Results 18 comments of


                                            luchangli

LMCache ERROR: The number of retrieved tokens is less than the expected number of tokens! This should not happen!

thanks for your fix.

support MLA format for sglang gpu connector

Currently SGLangGPUConnector does not support MLA, and MLA support will results in a lots of if else branch and redundant codes. In existing code, MLA and MHA need call different...

support MLA format for sglang gpu connector

I will update my pr and remove redundant codes.

support MLA format for sglang gpu connector

@Oasis-Git @YaoJiayi I have updated the code to reuse the existing code to support MLA format for sglang.

support MLA format for sglang gpu connector

@YaoJiayi please check the updated codes. using one kv intead of two. still use the multi_layer_kv_transfer_unilateral kernel since using the existing single_layer_kv_transfer kernel need be called twice, which is not...

support MLA format for sglang gpu connector

> @llc-kc Is the code ready? yes, ready and will not be modified

support MLA format for sglang gpu connector

@YaoJiayi @Oasis-Git code format fixed

support MLA format for sglang gpu connector

@wqlxx there is a test function test_sglang_connector_with_gpu_and_mla in https://github.com/LMCache/LMCache/blob/dev/tests/v1/test_gpu_connector.py Moreover, I have successfully run Qwen3 none PD sglang+LMCache offloading based on https://github.com/Oasis-Git/sglang/tree/lmcache.

support MLA format for sglang gpu connector

@wqlxx I think you can run None PD MHA first, then try MLA, and then try PD.

support MLA format for sglang gpu connector

@wqlxx I haven't tested deepseek yet.