bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

Expose rocksDB metrics

Open michaeljmarshall opened this issue 2 years ago • 3 comments

FEATURE REQUEST

  1. Please describe the feature you are requesting.

Expose rocks db metrics (statistics) in the /metrics endpoint. The motivation is to ensure the rocks db configuration is optimal.

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

This feature should help make it easier to tune the different rocks db configurations.

  1. Provide any additional detail on your proposed use case for this feature.

It looks like these statistics are exposed through the DBOptions object. I plan to help implement these metrics.

It might be worth putting these metrics behind a feature flag.

michaeljmarshall avatar Aug 09 '22 16:08 michaeljmarshall

We could expose the metrics by adding these methods to the KeyValueStorageRocksDB class.

    public long cacheMisses() throws IOException {
        return statistics.getTickerCount(TickerType.BLOCK_CACHE_MISS);
    }

    public long cacheHits() throws IOException {
        return statistics.getTickerCount(TickerType.BLOCK_CACHE_HIT);
    }

We might have trouble integrating them with the current StatsLogger because there is no way to register counters. If you look at the EntryLocationIndexStats, you'll see that it relies on registering a gauge. These metrics are not gauges though, so it could lead to some minor confusion when interpreting the metrics output.

michaeljmarshall avatar Aug 10 '22 04:08 michaeljmarshall

The available metrics are here: https://github.com/facebook/rocksdb/blob/d7ebb58cb531031853a183a5771bc4be8c10b45b/include/rocksdb/statistics.h. I am not very familiar with how Bookkeeper uses Rocks DB, so I am not certain which metrics would make the most sense to expose. There are several _HIT and _MISS metrics that might be meaningful.

michaeljmarshall avatar Aug 10 '22 04:08 michaeljmarshall

I have added a new metric for entry location index lookups with PR #3444 . That is slightly related. It doesn't go down into the RocksDB level, but it will help detect when RocksDB becomes a bottleneck.

lhotari avatar Aug 10 '22 07:08 lhotari