Improve ClickHouse Keeper Grafana Dashboard
Fix ClickHouse Keeper Grafana Dashboard
fix https://github.com/Altinity/clickhouse-operator/issues/1844
Problem
The ClickHouse Keeper dashboard was not working correctly due to:
- Incorrect Prometheus label names (
pod_namevspod,container_namevscontainer) - Template variables using incorrect label filters
- Hardcoded metric names in template variable queries
Changes
Label Fixes
- Changed
pod_name→podin all queries and template variables- Prometheus/Mimir exports Kubernetes pod labels as
pod, notpod_name
- Prometheus/Mimir exports Kubernetes pod labels as
- Removed
container_name="clickhouse-keeper"filters from all panel queries- Label should be
container, notcontainer_name - Filter not necessary as queries are already scoped to Keeper-specific metrics
- Label should be
- Updated legend formats from
{{pod_name}}to{{pod}}across all panels
Template Variable Improvements
Before:
label_values(up{container_name="clickhouse-keeper"}, namespace)
label_values(up{container_name="clickhouse-keeper", namespace=~"$namespace"}, pod_name)
After:
label_values({__name__=~"ClickHouse.*Keeper.*",pod=~"chk-clickhouse-keeper.*"}, namespace)
label_values({__name__=~"ClickHouse.*Keeper.*",pod=~"chk-clickhouse-keeper.*",namespace=~"$namespace"}, pod)
Benefits:
- Uses generic metric selector instead of up metric with incorrect label
- Explicit pod name filter (chk-clickhouse-keeper.*) ensures only Keeper pods are shown
- Works with any ClickHouse Keeper metric, not dependent on specific metric availability
@Slach ah i did miss the relabel config ... ok with that it works, correct. However i think this is kind of outdated and all other projects these days just use pod and container as labels instead of pod_name and container_name. So my suggestion would be to use the updated dashboard i provided and adjust the relabel config to be more streamlined to other metrics. In addition we would not "break" other setups because the relabel config would relabel all pod labels which have the annotation "prometheus_io_scrape=true".
I provided the corresponding changes and some minor fixes to the test setup itself you recommended ;)
I tried it with minikube and all metrics are working now: