Mooncake [Usage]: mem_cache_hit_nums and mem_cache_nums metrics not exposed via /metrics endpoint

Describe your usage question

Background

I built and deployed Mooncake based on the following PR:

👉 https://github.com/kvcache-ai/Mooncake/pull/1020

This PR introduces new metrics:

mem_cache_hit_nums

mem_cache_nums

file_cache_hit_nums

file_cache_nums

I would like to scrape these metrics using Prometheus via the /metrics endpoint to calculate memory/file cache hit ratios.

Problem

Currently, these metrics are available only via the client API, but they are not exposed through the /metrics HTTP endpoint.

Therefore, Prometheus cannot scrape them, making it impossible to compute cache hit ratios using Prometheus.

Expected Behavior

The new cache metrics should be included in the serialized metrics output so that Prometheus can scrape them.

Proposed Fix

I found that the following changes in mooncake-store/src/master_metric_manager.cpp (inside serialize_metrics()) enable the metrics to appear in /metrics:

serialize_metric(mem_cache_hit_nums_); serialize_metric(file_cache_hit_nums_); serialize_metric(mem_cache_nums_); serialize_metric(file_cache_nums_);

After adding these lines, Prometheus successfully scraped all four metrics.

Questions

Is this the correct and intended approach? The change appears consistent with how other metrics are exposed.

Can this change be officially supported and merged into the main branch? This would allow Prometheus users to compute cache hit ratios without relying on the client API.

Is there any concern about exposing these metrics via the /metrics endpoint, such as performance overhead or API compatibility?

Before submitting a new issue...

[ ] Make sure you already searched for relevant issues and read the documentation

Dec 02 '25 08:12 tianlang-wq

During validation, I noticed an issue:

mem_cache_hit_nums_ increments every time a cache hit occurs.

mem_cache_nums_ records the current number of KV entries stored in memory.

Analysis

In the calculate_cache_stats() method, the cache hit rate is computed as:

mem_cache_hit_nums_ / mem_cache_nums_

This seems incorrect.

The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:

cache_hits / total_requests

For example, master_exist_key_requests_total would be a more reasonable denominator.

Dec 02 '25 09:12 tianlang-wq

Exposing cache hit rate to Prometheus looks good to me. I am also curious why we expose these metrics in the client side (by querying the master) instead of directly exposing them on the master side. @Liziqi-77 Could you help take a look?

Dec 02 '25 11:12 ykwd

During validation, I noticed an issue:

mem_cache_hit_nums_ increments every time a cache hit occurs.

mem_cache_nums_ records the current number of KV entries stored in memory.

Analysis

In the calculate_cache_stats() method, the cache hit rate is computed as:

mem_cache_hit_nums_ / mem_cache_nums_

This seems incorrect.

The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:

cache_hits / total_requests

For example, master_exist_key_requests_total would be a more reasonable denominator.

It seems we’ve already discussed this issue here: https://github.com/kvcache-ai/Mooncake/issues/1136#issuecomment-3588429432

If anything in that description is unclear, feel free to point it out and we can take a closer look together.

Dec 02 '25 11:12 ykwd

在验证过程中，我发现了一个问题： mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。分析在 calculate_cache_stats() 方法中，缓存命中率的计算公式如下： mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用：缓存命中数 / 总请求数例如，master_exist_key_requests_total 将是一个更合理的计分母。

我们似乎已经在这里讨论过这个问题了：#1136（评论）

如果描述中有任何不清楚的地方，请随时指出，我们可以一起仔细查看。

I took a closer look and realized I made a mistake. Sorry about that, haha.

Dec 03 '25 01:12 tianlang-wq

I took a closer look and realized I made a mistake. Sorry about that, haha.

No worries at all ^_^

Dec 03 '25 03:12 ykwd

During validation, I noticed an issue:

mem_cache_hit_nums_ increments every time a cache hit occurs.

mem_cache_nums_ records the current number of KV entries stored in memory.

Analysis

In the calculate_cache_stats() method, the cache hit rate is computed as:

mem_cache_hit_nums_ / mem_cache_nums_

This seems incorrect.

The number of current KV entries is not a meaningful denominator for a hit-rate calculation. A proper cache hit rate should use:

cache_hits / total_requests

For example, master_exist_key_requests_total would be a more reasonable denominator.

Thank you very much for pointing out this issue. As you mentioned, the denominator counts repeatedly when calculating hit rate. I'm already fixing this bug and will submit an issue to track it.

Dec 03 '25 06:12 Liziqi-77

在验证过程中，我发现了一个问题： mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。分析在 calculate_cache_stats() 方法中，缓存命中率的计算公式如下： mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用：缓存命中数 / 总请求数例如，master_exist_key_requests_total 将是一个更合理的计分母。

非常感谢您指出这个问题。正如您所说，计算命中率时分母会被重复计数。我已经在修复这个错误，并将提交一个 issue 来跟踪它。

Is there a link for ISS？

Dec 03 '25 06:12 tianlang-wq

在验证过程中，我发现了一个问题： mem_cache_hit_nums_ 每次缓存命中时递增。 mem_cache_nums_ 记录内存中存储的当前 KV 条目数。分析在 calculate_cache_stats() 方法中，缓存命中率的计算公式如下： mem_cache_hit_nums_ / mem_cache_nums_ 这似乎不正确。当前键值条目的数量并非计算命中率的有效分母。正确的缓存命中率计算应使用：缓存命中数 / 总请求数例如，master_exist_key_requests_total 将是一个更合理的计分母。

非常感谢您指出这个问题。正如您所说，计算命中率时分母会被重复计数。我已经在修复这个错误，并将提交一个 issue 来跟踪它。

Is there a link for ISS？

sure, duplicate counts for mem_cache_hits and ssd_cache_hits.

Dec 03 '25 06:12 Liziqi-77

Thank you very much for your help

Dec 03 '25 06:12 tianlang-wq