[RFC]: Adapt KVCache Offloading Framework for vLLM v1 Architecture
Summary
The current KVCache offloading framework is built around assumptions from the vLLM v0 architecture. With the release of vLLM v1, which introduces new cache handling semantics, especially the more granuler control on the layer by layer . we need to adapt our framework to support the v1 runtime and interface design.
This effort aims to bring first-class support for vLLM v1 to our offloading framework by leveraging its advanced runtime design. The focus is on aligning with v1’s more granular execution model while preserving or improving cache efficiency and system throughput.
Motivation
A v1-compatible KVCache offloading layer that fully supports layer-wise cache transitions should give performance benefits observed in early evaluations.
Proposed Change
TODO. @DwyaneShi
Alternatives Considered
No response