Wrong template variable in some prometheus cluster rules
https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-offline.yaml#L7
https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-ha-critical.yaml#L7
https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-ha-warning.yaml#L7
All of the above reference the cluster name incorrectly by using {{ $labels.job }}, causing them to not expand in the file, which then render as blank values when the alert is thrown. They will expand correctly if changed to {{ .namespace }}/{{ .cluster }} in accordance with the other prom rules.
That's odd, because the .labels is provided from here:
https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/templates/prometheus-rule.yaml#L11-L29
I think the problem is just with the CNPGClusterOffline query:
The count() aggregation here doesn't return any of the labels from the underlying cnpg_collector_up metric. Which is why there are no labels at the end in the alert description. The rest of the alerts are fine.
Hi, @Wain13. I'm Dosu, and I'm helping the charts team manage their backlog. I'm marking this issue as stale.
Issue Summary
- The issue involves incorrect template variables in Prometheus cluster rules.
{{ $labels.job }}is used instead of{{ .namespace }}/{{ .cluster }}, causing blank alert values.- @itay-grudev identified the issue with the
CNPGClusterOfflinequery. - The
count()aggregation does not return labels from thecnpg_collector_upmetric.
Next Steps
- Please confirm if this issue is still relevant to the latest version of the charts repository by commenting here.
- If there is no further activity, the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!