vllm
vllm copied to clipboard
[V1][Metrics] add support for kv event publishing
RFC: KVBlocks and Metrics Publishing In Inference Frameworks
- Added KVCacheEvent, BlockStored, BlockRemoved, and AllBlocksCleared msgspec classes
- Created a queue in the BlockPool and write these events in the appropriate functions
- Bubble the events up to the scheduler where they are appended to EngineCoreOutputs
- Add kv_cach_events to EngineCoreOutputs
- Wrote unit tests at the BlockManager level to test basic functionality and at the EngineCore level testing correct propagation and serializing over zmq.
API
- add enable_kv_cache_events to engineArgs ~- add external_stat_loggers field to AsyncLLM API~ Covered by https://github.com/vllm-project/vllm/pull/14661
With https://github.com/vllm-project/vllm/pull/14661 and this PR a 3rd party can write a custom stat logger to consume both engine Stats and Events and publish them elsewhere.