[Observability] Integrate LMCache observability to vLLM's KV connector metrics
Description
Recently, vLLM has added support for native Prometheus metrics for KV connectors (see vllm-project/vllm#26811).
Right now, the LMCache Prometheus support uses the local file system to pass the metrics to the main vLLM process, which may have some drawbacks (like having staging files).
After integrating with vLLM's native Prometheus metric system, we don't need PROMETHEUS_MULTIPROC_DIR anymore.
High-level implementation proposal
Changes on the LMCache side:
- Add some code to disable the Prometheus metric reporting thread
Changes on the vLLM side (in lmcache_connector.py):
- Add the new function that reads the LMCache metrics and returns them to vLLM
Note that we DO NOT need to modify the stats collector and the metric definitions in LMCache. The new code in vLLM can directly reuse those data structures.
Additional context
For the changes in vLLM, please tag @ApostaC or @KuntaiDu for reviewing.
cc @maobaolong @hickeyma . Let me know if you have any other thoughts on this
Hi @ApostaC, I’m a graduate student at CMU and I’m trying to get familiar with the LMCache community. This issue looks like a great starting point, and I’d love to try working on it. Thanks!
@XinyuJiangCMU Hey, thanks for your interest! Let me assign it to you. Looking forward to your PR!
Hey @ApostaC , i wrote a PR on this in vllm.
Need to discuss with you on the transition steps to move over the promethus monitoring to vllm.
https://github.com/vllm-project/vllm/pull/29214
I will take a look at this!
PR to refactor Promethus Logger and disable logger thread by env variable: https://github.com/LMCache/LMCache/pull/2123