luchangli
luchangli
Support both MHA and MLA for sglang gpu connector, and MLA reuse VLLM cuda kernel.
The cache store execution time mainly contains three parts: alloc, cpu-gpu copy, submit_put_task. Existing local_disk_backend submit_put_task do a lot works that typically take more than 1ms execution time, which reduce...
**Describe the bug** A bug introduced in lmcache==0.3.1 and do not happen in 0.3.0 when bench the vllm serving at the second time, the error logging shows: ``` # logging...