luchangli issues

Repositories
Issues
Comments

Results 3 issues of


                                            luchangli

support MLA format for sglang gpu connector

Support both MHA and MLA for sglang gpu connector, and MLA reuse VLLM cuda kernel.

[optimize] boost local_disk_backend submit_put_task performance

The cache store execution time mainly contains three parts: alloc, cpu-gpu copy, submit_put_task. Existing local_disk_backend submit_put_task do a lot works that typically take more than 1ms execution time, which reduce...

LMCache ERROR: The number of retrieved tokens is less than the expected number of tokens! This should not happen!

**Describe the bug** A bug introduced in lmcache==0.3.1 and do not happen in 0.3.0 when bench the vllm serving at the second time, the error logging shows: ``` # logging...