kustomize-controller
kustomize-controller copied to clipboard
`gotk_resource_info{ready="Unknown"}` during the reconciliation
Problem
gotk_resource_info metric has label of ready="Unknown" for deamonset that takes ~1h to rollout.
This triggers prometheus alert for resource being in non-ready state for prolonged period of time, however it is being successfully rolled out. See:
status:
conditions:
- lastTransitionTime: "2025-05-23T14:30:17Z"
message: Running health checks for <redacted>
with a timeout of 1h0m0s
observedGeneration: 20
reason: Progressing
status: "True"
type: Reconciling
- lastTransitionTime: "2025-05-23T14:30:16Z"
message: Reconciliation in progress
observedGeneration: 20
reason: Progressing
status: Unknown
type: Ready
- lastTransitionTime: "2025-05-23T14:30:17Z"
message: Running health checks for revision <redacted>
with a timeout of 1h0m0s
observedGeneration: 20
reason: Progressing
status: Unknown
type: Healthy
Metric:
gotk_resource_info{
... <redacted> ...
customresource_group="kustomize.toolkit.fluxcd.io",
customresource_kind="Kustomization",
customresource_version="v1",
ready="Unknown",
suspended="false",
source_name="flux-system"
... <redacted> ...
}
Expected behaviour
Metric has label like ready="Progressing", this way alert can be configured to not alert on progressing resources.
Configuration
-
Kustomization:
apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: <redacted> namespace: <redacted> spec: interval: 10m serviceAccountName: kustomize-controller sourceRef: kind: GitRepository name: flux-system path: <redacted> prune: true wait: true suspend: false timeout: 60m dependsOn: <redacted> -
Alertmanager alert^1
- alert: FluxCDResourceNotReady expr: gotk_resource_info{ready!="True"} > 0 for: 15m
You can change the metrics as you like in the kube-state-metrics config. If you prefer the reason instead of the status for ready, change the config to:
ready: [ status, conditions, "[type=Ready]", reason ]
https://github.com/fluxcd/flux2-monitoring-example/blob/4b0f96da1541309240b02a1e3e1116d93cb3e6d9/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml#L51
Or you can add a new metric for healthy and compare the two.
Thanks a lot for a quick response. I will take a look at the provided example!