[Usage]: mem_cache_hit_nums and mem_cache_nums metrics not exposed via /metrics endpoint
Describe your usage question
Background
I built and deployed Mooncake based on the following PR:
👉 https://github.com/kvcache-ai/Mooncake/pull/1020
This PR introduces new metrics:
mem_cache_hit_nums
mem_cache_nums
file_cache_hit_nums
file_cache_nums
I would like to scrape these metrics using Prometheus via the /metrics endpoint to calculate memory/file cache hit ratios.
Problem
Currently, these metrics are available only via the client API, but they are not exposed through the /metrics HTTP endpoint.
Therefore, Prometheus cannot scrape them, making it impossible to compute cache hit ratios using Prometheus.
Expected Behavior
The new cache metrics should be included in the serialized metrics output so that Prometheus can scrape them.
Proposed Fix
I found that the following changes in mooncake-store/src/master_metric_manager.cpp (inside serialize_metrics()) enable the metrics to appear in /metrics:
serialize_metric(mem_cache_hit_nums_); serialize_metric(file_cache_hit_nums_); serialize_metric(mem_cache_nums_); serialize_metric(file_cache_nums_);
After adding these lines, Prometheus successfully scraped all four metrics.
Questions
Is this the correct and intended approach? The change appears consistent with how other metrics are exposed.
Can this change be officially supported and merged into the main branch? This would allow Prometheus users to compute cache hit ratios without relying on the client API.
Is there any concern about exposing these metrics via the /metrics endpoint, such as performance overhead or API compatibility?
Before submitting a new issue...
- [ ] Make sure you already searched for relevant issues and read the documentation
During validation, I noticed an issue:
mem_cache_hit_nums_ increments every time a cache hit occurs.
mem_cache_nums_ records the current number of KV entries stored in memory.
Analysis
In the calculate_cache_stats() method, the cache hit rate is computed as:
mem_cache_hit_nums_ / mem_cache_nums_
This seems incorrect.
The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:
cache_hits / total_requests
For example, master_exist_key_requests_total would be a more reasonable denominator.
Exposing cache hit rate to Prometheus looks good to me. I am also curious why we expose these metrics in the client side (by querying the master) instead of directly exposing them on the master side. @Liziqi-77 Could you help take a look?
During validation, I noticed an issue:
mem_cache_hit_nums_ increments every time a cache hit occurs.
mem_cache_nums_ records the current number of KV entries stored in memory.
Analysis
In the calculate_cache_stats() method, the cache hit rate is computed as:
mem_cache_hit_nums_ / mem_cache_nums_
This seems incorrect.
The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:
cache_hits / total_requests
For example, master_exist_key_requests_total would be a more reasonable denominator.
It seems we’ve already discussed this issue here: https://github.com/kvcache-ai/Mooncake/issues/1136#issuecomment-3588429432
If anything in that description is unclear, feel free to point it out and we can take a closer look together.
在验证过程中,我发现了一个问题: mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。 分析 在 calculate_cache_stats() 方法中,缓存命中率的计算公式如下: mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。 当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用: 缓存命中数 / 总请求数 例如,master_exist_key_requests_total 将是一个更合理的计分母。
我们似乎已经在这里讨论过这个问题了:#1136(评论)
如果描述中有任何不清楚的地方,请随时指出,我们可以一起仔细查看。
I took a closer look and realized I made a mistake. Sorry about that, haha.
I took a closer look and realized I made a mistake. Sorry about that, haha.
No worries at all ^_^
During validation, I noticed an issue:
mem_cache_hit_nums_ increments every time a cache hit occurs.
mem_cache_nums_ records the current number of KV entries stored in memory.
Analysis
In the calculate_cache_stats() method, the cache hit rate is computed as:
mem_cache_hit_nums_ / mem_cache_nums_
This seems incorrect.
The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:
cache_hits / total_requests
For example, master_exist_key_requests_total would be a more reasonable denominator.
Thank you very much for pointing out this issue. As you mentioned, the denominator counts repeatedly when calculating hit rate. I'm already fixing this bug and will submit an issue to track it.
在验证过程中,我发现了一个问题: mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。 分析 在 calculate_cache_stats() 方法中,缓存命中率的计算公式如下: mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。 当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用: 缓存命中数 / 总请求数 例如,master_exist_key_requests_total 将是一个更合理的计分母。
非常感谢您指出这个问题。正如您所说,计算命中率时分母会被重复计数。我已经在修复这个错误,并将提交一个 issue 来跟踪它。
Is there a link for ISS?
在验证过程中,我发现了一个问题: mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。 分析 在 calculate_cache_stats() 方法中,缓存命中率的计算公式如下: mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。 当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用: 缓存命中数 / 总请求数 例如,master_exist_key_requests_total 将是一个更合理的计分母。
非常感谢您指出这个问题。正如您所说,计算命中率时分母会被重复计数。我已经在修复这个错误,并将提交一个 issue 来跟踪它。
Is there a link for ISS?
sure, duplicate counts for mem_cache_hits and ssd_cache_hits.
Thank you very much for your help