LMCache icon indicating copy to clipboard operation
LMCache copied to clipboard

[Observability] Integrate LMCache observability to vLLM's KV connector metrics

Open ApostaC opened this issue 2 months ago • 6 comments

Description

Recently, vLLM has added support for native Prometheus metrics for KV connectors (see vllm-project/vllm#26811).

Right now, the LMCache Prometheus support uses the local file system to pass the metrics to the main vLLM process, which may have some drawbacks (like having staging files). After integrating with vLLM's native Prometheus metric system, we don't need PROMETHEUS_MULTIPROC_DIR anymore.

High-level implementation proposal

Changes on the LMCache side:

  • Add some code to disable the Prometheus metric reporting thread

Changes on the vLLM side (in lmcache_connector.py):

  • Add the new function that reads the LMCache metrics and returns them to vLLM

Note that we DO NOT need to modify the stats collector and the metric definitions in LMCache. The new code in vLLM can directly reuse those data structures.

Additional context

For the changes in vLLM, please tag @ApostaC or @KuntaiDu for reviewing.

ApostaC avatar Oct 29 '25 17:10 ApostaC

cc @maobaolong @hickeyma . Let me know if you have any other thoughts on this

ApostaC avatar Oct 29 '25 17:10 ApostaC

Hi @ApostaC, I’m a graduate student at CMU and I’m trying to get familiar with the LMCache community. This issue looks like a great starting point, and I’d love to try working on it. Thanks!

XinyuJiangCMU avatar Oct 31 '25 18:10 XinyuJiangCMU

@XinyuJiangCMU Hey, thanks for your interest! Let me assign it to you. Looking forward to your PR!

ApostaC avatar Nov 03 '25 18:11 ApostaC

Hey @ApostaC , i wrote a PR on this in vllm.

Need to discuss with you on the transition steps to move over the promethus monitoring to vllm.

https://github.com/vllm-project/vllm/pull/29214

aeon-x avatar Nov 24 '25 05:11 aeon-x

I will take a look at this!

ApostaC avatar Nov 24 '25 18:11 ApostaC

PR to refactor Promethus Logger and disable logger thread by env variable: https://github.com/LMCache/LMCache/pull/2123

aeon-x avatar Nov 29 '25 23:11 aeon-x