observability icon indicating copy to clipboard operation
observability copied to clipboard

Results 17 observability issues
Sort by recently updated
recently updated
newest added

Ash has some stuff stubbed out here. Let's figure out how to integrate into this repo. The current dashboard is json based, so we'll want to turn it into something...

Using `rate` causes an unexpected outcome. `deriv` works well. ref. https://redpandadata.slack.com/archives/C03H26FHJQL/p1724904615666739

This metric incorrectly calculates the usage when a learner event is happening (decom, node add, etc). we should be using these instead for determining on-the-wire traffic for the cluster for...

with 24.3, we now get a metric to track when a commercial license is expiring so users can track and alert when that's getting close to end date. on alerts,...

https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/ Won't link internal thread, but we tend to use hardcoded intervals now for rates, when we should probably be using `$__rate_interval` This starts to break down if the time...

As of 24.2.10, we now have the following metric to show seconds to expiry. ``` # HELP redpanda_cluster_features_enterprise_license_expiry_sec Number of seconds remaining until the Enterprise license expires # TYPE redpanda_cluster_features_enterprise_license_expiry_sec...

``` "expr": "100 * abs(1-(sum(stddev by (redpanda_topic) (sum(redpanda_kafka_max_offset{redpanda_namespace=\"kafka\",redpanda_cloud_data_cluster_name=~\"\"}) by (redpanda_topic,redpanda_partition))) / sum(avg by (redpanda_topic) ((sum(redpanda_kafka_max_offset{redpanda_namespace=\"kafka\",redpanda_cloud_data_cluster_name=~\"\"}) by (redpanda_topic,redpanda_partition))))))", ``` This currently shows the balance of writes to partitions across the cluster....