kustomize-controller `gotk_resource_info{ready="Unknown"}` during the reconciliation

Problem

gotk_resource_info metric has label of ready="Unknown" for deamonset that takes ~1h to rollout.

This triggers prometheus alert for resource being in non-ready state for prolonged period of time, however it is being successfully rolled out. See:

status:
  conditions:
  - lastTransitionTime: "2025-05-23T14:30:17Z"
    message: Running health checks for <redacted>
      with a timeout of 1h0m0s
    observedGeneration: 20
    reason: Progressing
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2025-05-23T14:30:16Z"
    message: Reconciliation in progress
    observedGeneration: 20
    reason: Progressing
    status: Unknown
    type: Ready
  - lastTransitionTime: "2025-05-23T14:30:17Z"
    message: Running health checks for revision <redacted>
      with a timeout of 1h0m0s
    observedGeneration: 20
    reason: Progressing
    status: Unknown
    type: Healthy

Metric:

gotk_resource_info{
  ... <redacted> ...
  customresource_group="kustomize.toolkit.fluxcd.io",
  customresource_kind="Kustomization",
  customresource_version="v1",
  ready="Unknown",
  suspended="false",
  source_name="flux-system"
  ... <redacted> ...
  }

Expected behaviour

Metric has label like ready="Progressing", this way alert can be configured to not alert on progressing resources.

Configuration

Kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: <redacted>
  namespace: <redacted>
spec:
  interval: 10m
  serviceAccountName: kustomize-controller
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: <redacted>
  prune: true
  wait: true
  suspend: false
  timeout: 60m
  dependsOn: <redacted>

Alertmanager alert^1

        - alert: FluxCDResourceNotReady
          expr: gotk_resource_info{ready!="True"} > 0
          for: 15m

May 23 '25 15:05 taraspos

You can change the metrics as you like in the kube-state-metrics config. If you prefer the reason instead of the status for ready, change the config to:

ready: [ status, conditions, "[type=Ready]", reason ]

https://github.com/fluxcd/flux2-monitoring-example/blob/4b0f96da1541309240b02a1e3e1116d93cb3e6d9/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml#L51

Or you can add a new metric for healthy and compare the two.

May 23 '25 15:05 stefanprodan

Thanks a lot for a quick response. I will take a look at the provided example!

May 23 '25 15:05 taraspos

kustomize-controller kustomize-controller copied to clipboard

`gotk_resource_info{ready="Unknown"}` during the reconciliation

Problem

Expected behaviour

Configuration

kustomize-controller
kustomize-controller copied to clipboard