Yihua Cheng issues

Results 10 issues of


                                            Yihua Cheng

[Bug]: Runtime error when running MLA models with "prefix caching enabled" and "chunked prefill disabled"

### Your current environment The output of `python collect_env.py` ```text INFO 03-01 00:48:13 [__init__.py:207] Automatically detected platform cuda. Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used...

bug

[P/D][V1] KV Connector API V1

### **TL;DR:** This PR opens the KV connector API in v1 to support disaggregated prefill. It also includes a minimal functional implementation as an example of how to use the...

documentation

ready

ci/build

[WIP] [Core][P/D] CPU connector for PD disagg

**TL;DR:** This PR implements CPU-based connector for PD disaggregation, with the following features - [x] Async layerwise D2H copies at prefiller side - [x] (WIP) Async layerwise H2D copies at...

[RFC][Discussion] Add vLLM generation as unit tests to test the functionality/correctness/potential crashes

# Background We currently have a comprehensive suite of unit tests that cover the internal functionalities of LMCache. However, these tests are mostly scoped within the LMCache repository and do...

help wanted

Testing

discussion

[RFC] Better configuration with toml and new configuration classes

**Is your feature request related to a problem? Please describe.** Currently, the configuration file is in a flat yaml, and the config class is a flat dataclass. It will become...

enhancement

good first issue

Refactoring

[Core] GPU connector performance improvement

**Is your feature request related to a problem? Please describe.** In #678, we've seen a performance problem in the current GPU connector implementation, and @yanok provides a fix in `VLLMPagedMemGPUConnectorV2`...

enhancement

good first issue

help wanted

new feature

Yihua Cheng

[Bug]: Runtime error when running MLA models with "prefix caching enabled" and "chunked prefill disabled"

[P/D][V1] KV Connector API V1

[WIP] [Core][P/D] CPU connector for PD disagg

[RFC][Discussion] Add vLLM generation as unit tests to test the functionality/correctness/potential crashes

[RFC] Better configuration with toml and new configuration classes

[Core] GPU connector performance improvement

[Doc] Update the docs to include the usage of "LMCacheConnectorV1Dynamic"

[Core] NIXL integration follow ups

[Core] add new batched copy cuda kernel

[Observability] Integrate LMCache observability to vLLM's KV connector metrics