strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

[Bug] ZK request latency metrics of little value

Open tombentley opened this issue 5 years ago • 1 comments

Describe the bug

The provided grafana dashboard includes charts/time series of the minimum average and maximum ZK request latency metrics.

  • In practice the minimum is always seems to be zero, so it's of very little value. In any case it's the minimum over the lifetime of the JVM process, so once it is zero it won't go higher.

  • The maximum suffers from the same thing: It's the maximum over the life of the JVM process, so you have no idea whether it is currently as this value or this maximum is historical.

  • The average is an average over the life of the JVM process. This means that it might be useful if the process is young, but it won't move much if the process has been alive for a long time.

There are MBean operations for resetting the latency attributes on which these metrics are based, but the JMX exporter obviously doesn't know anything about those operations.

It's worth pointing out that Kafka itself has its own kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs metric which also has various percentiles.

To Reproduce

  1. Look at the default provided grafana dashboard for ZK.

tombentley avatar Apr 16 '19 16:04 tombentley

Triaged on 2.8.2022: The charts from the ZooKeeper dashboard showing minimum and maximum latency should be removed. The one showing average should be kept. A new chart should be added to the Kafka dashboard which shows the ZooKeeper request latency from the Kafka client which is already provided with percentiles.

scholzj avatar Aug 02 '22 15:08 scholzj

I'll work on this

ShazaAldawamneh avatar Aug 15 '23 12:08 ShazaAldawamneh