kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Generated Prometheus metrics output not meet with the requirements

Open kallaics opened this issue 1 year ago • 5 comments

What happened:

The KSM configuration worked well until KSM version v2.10.1. After the upgrade to v2.11.0 the Prometheus reported "invalid metric type" error message. The latest version v2.12.0 solved the "invalid metric type issue", but the required output has been provided only one resource type per metrics. The deployment and configuration not changed during this period.

The issue affected with the "build_info" metric name.

What you expected to happen:

To provide Prometheus output with same metric name and more resource type.

How to reproduce it (as minimally and precisely as possible):

  1. Kube state metrics deployed from prometheus-community/kube-prometheus-stack Helm chart via FluxCD
  2. Relevant Kube State Metrics configuration provided in Yaml format.
kube-state-metrics:
  collectors: [ ]
  extraArgs:
    - --custom-resource-state-only=true
  rbac:
    extraRules:
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs: 
          - list
          - watch
      - apiGroups:
          - source.toolkit.fluxcd.io
          - kustomize.toolkit.fluxcd.io
          - helm.toolkit.fluxcd.io
          - notification.toolkit.fluxcd.io
          - image.toolkit.fluxcd.io
        resources:
          - gitrepositories
          - buckets
          - helmrepositories
          - helmcharts
          - ocirepositories
          - kustomizations
          - helmreleases
          - alerts
          - providers
          - receivers
          - imagerepositories
          - imagepolicies
          - imageupdateautomations
        verbs: [ "list", "watch" ]
  customResourceState:
    enabled: true
    config:
      spec:
        resources:
          - groupVersionKind:
              group: apps
              version: v1
              kind: Deployment
            metricNamePrefix: gotk
            metrics:
              - name: "build_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      version: [metadata, labels, "app.kubernetes.io/version" ]
                      component: [metadata, labels, "app.kubernetes.io/component" ]
                      instance: [metadata, labels, "app.kubernetes.io/instance" ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
          - groupVersionKind:
              group: kustomize.toolkit.fluxcd.io
              version: v1
              kind: Kustomization
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, sourceRef, name ]
          - groupVersionKind:
              group: helm.toolkit.fluxcd.io
              version: v2beta2
              kind: HelmRelease
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  released: [ status, conditions, "[type=Released]", status ]
                  suspended: [ spec, suspend ]
                  chart_name: [ spec, chart, spec, chart ]
                  chart_source_name: [ spec, chart, spec, sourceRef, name ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1
              kind: GitRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: Bucket
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  endpoint: [ spec, endpoint ]
                  bucket_name: [ spec, bucketName ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: HelmRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: HelmChart
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  chart_name: [ spec, chart ]
                  chart_version: [ spec, version ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: OCIRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1beta3
              kind: Alert
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1beta3
              kind: Provider
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1
              kind: Receiver
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  webhook_path: [ status, webhookPath ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta2
              kind: ImageRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  image: [ spec, image ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta2
              kind: ImagePolicy
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, imageRepositoryRef, name ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta1
              kind: ImageUpdateAutomation
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, sourceRef, name ]

Anything else we need to know?:

Environment:

  • kube-state-metrics version: v2.12.0
  • Kubernetes version (use kubectl version): 1.28.5
  • Cloud provider or hardware configuration: Azure Kubernetes Service
  • Other info: Deployed with Helm from kube-prometheus-stack Helm chart.

kallaics avatar Apr 08 '24 12:04 kallaics

I've tested the flux2-monitoring-example and verified we were using kube-state-metrics v2.12.0, it does not seem to resolve the issue completely, though some metrics came back, in https://github.com/fluxcd/flux2-monitoring-example/issues/32 you can see we only returned "HelmRelease" metrics and the other resource kinds' metrics did not come back.

kingdonb avatar Apr 10 '24 16:04 kingdonb

I did some tests and found, that it's related to the code change of the SanitizeHeaders function in: #2270 https://github.com/kubernetes/kube-state-metrics/pull/2270/files#diff-60450a33adea08c953656dd1e78a80e9f3b279bbc7656dedf31fd1a0c7fc1196

The issue seems to be in the help: "The current state of a GitOps Toolkit resource." message. If you make this one unique (ex. different one for HelmRelease, Kustomization, etc.), the metrics do not get removed by the function mentioned above.

I am just not sure if that's a bug or a feature, maybe the author @rexagod knows?

speer avatar Apr 16 '24 11:04 speer

/assign @CatherineF-dev /triage accepted

logicalhan avatar Apr 18 '24 16:04 logicalhan

I did some tests and found, that it's related to the code change of the SanitizeHeaders function in: #2270 https://github.com/kubernetes/kube-state-metrics/pull/2270/files#diff-60450a33adea08c953656dd1e78a80e9f3b279bbc7656dedf31fd1a0c7fc1196

The issue seems to be in the help: "The current state of a GitOps Toolkit resource." message. If you make this one unique (ex. different one for HelmRelease, Kustomization, etc.), the metrics do not get removed by the function mentioned above.

I am just not sure if that's a bug or a feature, maybe the author @rexagod knows?

I can confirm. After I changed the "help" fields, the metrics are appeared in Prometheus and Grafana. Thanks @speer !

kallaics avatar Apr 19 '24 18:04 kallaics

Hello, apologies for the late response. 👋🏼

Prometheus' protobuf machinery does not support all OpenMetrics types at the moment (https://github.com/kubernetes/kube-state-metrics/issues/2248). To resolve this, #2270 was merged which implicitly converted stateset and info to gauge metrics, before piping them out (PTAL at these test-cases). This, in turn, gave rise to cases where metrics that were previously seemingly non-conflicting, would potentially start to conflict now, which is why the patch had to include a deduplicating capability, causing the issue raised here as a side-effect.

https://github.com/fluxcd/flux2-monitoring-example/issues/32#issuecomment-2059346695 presents a take on this that has been the implicit sentiment on such configuration scenarios, i.e., if the use-case warrants for different groupVersionKind definitions, it should ideally be acquainted by different help texts to indicate what changed between them.

I'd be happy to follow this up by pointing out the caveat observed here in the documentation for future instances.

rexagod avatar May 20 '24 09:05 rexagod