[RFC]: Adapt KVCache Offloading Framework for vLLM v1 Architecture

Open Jeffwan opened this issue 7 months ago • 0 comments

Summary

The current KVCache offloading framework is built around assumptions from the vLLM v0 architecture. With the release of vLLM v1, which introduces new cache handling semantics, especially the more granuler control on the layer by layer . we need to adapt our framework to support the v1 runtime and interface design.

This effort aims to bring first-class support for vLLM v1 to our offloading framework by leveraging its advanced runtime design. The focus is on aligning with v1’s more granular execution model while preserving or improving cache efficiency and system throughput.

Motivation

A v1-compatible KVCache offloading layer that fully supports layer-wise cache transitions should give performance benefits observed in early evaluations.

Proposed Change

TODO. @DwyaneShi

Alternatives Considered

No response

May 23 '25 21:05 Jeffwan