luchangli
luchangli
thanks for your fix.
Currently SGLangGPUConnector does not support MLA, and MLA support will results in a lots of if else branch and redundant codes. In existing code, MLA and MHA need call different...
I will update my pr and remove redundant codes.
@Oasis-Git @YaoJiayi I have updated the code to reuse the existing code to support MLA format for sglang.
@YaoJiayi please check the updated codes. using one kv intead of two. still use the multi_layer_kv_transfer_unilateral kernel since using the existing single_layer_kv_transfer kernel need be called twice, which is not...
> @llc-kc Is the code ready? yes, ready and will not be modified
@YaoJiayi @Oasis-Git code format fixed
@wqlxx there is a test function test_sglang_connector_with_gpu_and_mla in https://github.com/LMCache/LMCache/blob/dev/tests/v1/test_gpu_connector.py Moreover, I have successfully run Qwen3 none PD sglang+LMCache offloading based on https://github.com/Oasis-Git/sglang/tree/lmcache.
@wqlxx I think you can run None PD MHA first, then try MLA, and then try PD.
@wqlxx I haven't tested deepseek yet.