clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Improve ClickHouse Keeper Grafana Dashboard

Open discostur opened this issue 1 month ago • 2 comments

Fix ClickHouse Keeper Grafana Dashboard

fix https://github.com/Altinity/clickhouse-operator/issues/1844

Problem

The ClickHouse Keeper dashboard was not working correctly due to:

  1. Incorrect Prometheus label names (pod_name vs pod, container_name vs container)
  2. Template variables using incorrect label filters
  3. Hardcoded metric names in template variable queries

Changes

Label Fixes

  • Changed pod_namepod in all queries and template variables
    • Prometheus/Mimir exports Kubernetes pod labels as pod, not pod_name
  • Removed container_name="clickhouse-keeper" filters from all panel queries
    • Label should be container, not container_name
    • Filter not necessary as queries are already scoped to Keeper-specific metrics
  • Updated legend formats from {{pod_name}} to {{pod}} across all panels

Template Variable Improvements

Before:

label_values(up{container_name="clickhouse-keeper"}, namespace)
label_values(up{container_name="clickhouse-keeper", namespace=~"$namespace"}, pod_name)

After:

label_values({__name__=~"ClickHouse.*Keeper.*",pod=~"chk-clickhouse-keeper.*"}, namespace)
label_values({__name__=~"ClickHouse.*Keeper.*",pod=~"chk-clickhouse-keeper.*",namespace=~"$namespace"}, pod)

Benefits:

  • Uses generic metric selector instead of up metric with incorrect label
  • Explicit pod name filter (chk-clickhouse-keeper.*) ensures only Keeper pods are shown
  • Works with any ClickHouse Keeper metric, not dependent on specific metric availability

discostur avatar Nov 24 '25 14:11 discostur

@Slach ah i did miss the relabel config ... ok with that it works, correct. However i think this is kind of outdated and all other projects these days just use pod and container as labels instead of pod_name and container_name. So my suggestion would be to use the updated dashboard i provided and adjust the relabel config to be more streamlined to other metrics. In addition we would not "break" other setups because the relabel config would relabel all pod labels which have the annotation "prometheus_io_scrape=true".

discostur avatar Nov 27 '25 16:11 discostur

I provided the corresponding changes and some minor fixes to the test setup itself you recommended ;)

I tried it with minikube and all metrics are working now:

Bildschirmfoto 2025-11-27 um 17 39 18

discostur avatar Nov 27 '25 16:11 discostur