kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

labels_allow_list and annotations_allow_list wildcards clobber resource specific configuration

Open ringerc opened this issue 1 year ago • 1 comments

What happened:

When configuring kube-state-metrics to emit kube_{resourcekindplural}_labels info-metrics, I found that a * entry in the label allow map clobbers all configuration for all other resource types.

Expressed in kube-state-metrics yaml config format rather than CLI format for readability, for example, adding the wildcard in the below:

    labels_allow_list:
      "*":
        - some_custom_label
      nodes:
        # AWS EKS node pool
        - eks.amazonaws.com/nodegroup
        # Azure AKS node pool
        - agentpool
        # google GKE node pool
        - cloud.google.com/gke-nodepool

will cause kube_node_labels to have only label_some_custom_label.

The addition of the wildcard removes the previously present label_eks_amazonaws_com_nodegroup, label_agentpool and label_cloud_google_com_gke_nodepool.

So the net result of this configuration is equivalent to:

    labels_allow_list:
      nodes:
        - some_custom_label

What you expected to happen:

I expected the resource-specific configuration to either append to or override the wildcard configuration, so the net effective configuration would be either:

    labels_allow_list:
      nodes:
        # labels from "*"
        - some_custom_label
        # labels from "nodes"
        - eks.amazonaws.com/nodegroup
        - agentpool
        - cloud.google.com/gke-nodepool
      otherresource:
        # labels from "*"
        - some_custom_label
      # ...

or (if node-specific overrides * rather than appending:

    labels_allow_list:
      nodes:
        # labels from "nodes"
        - eks.amazonaws.com/nodegroup
        - agentpool
        - cloud.google.com/gke-nodepool
      otherresource:
        # labels from "*"
        - some_custom_label
      # ...

How to reproduce it (as minimally and precisely as possible):

Run kube-state-metrics with CLI arguments for wildcard added:

"--metric-labels-allowlist=nodes=[kubernetes.io/arch],*=[somenonexistentlabel]"

and query its metrics endpoint over port-forward (or use prometheus) e.g.

curl -sSLf1 http://127.0.0.1:8080/metrics |grep ^kube_node_labels

you will note that the node labels do not contain label_kubernetes_io_arch.

Now re-launch kube-state-metrics, but this time with CLI arguments omitting the wildcard:

"--metric-labels-allowlist=nodes=[kubernetes.io/arch]"

If you query the metrics endpoint, the label_kubernetes_io_arch label will appear on the metrics.

Anything else we need to know?:

This looks like it was intentional, per https://github.com/kubernetes/kube-state-metrics/blame/c864c93606db61e1c424b9313da03522f9f11adb/internal/store/builder.go#L235-L239

It was added in https://github.com/kubernetes/kube-state-metrics/commit/0b76e7d4f484e8eb7f71f06c66b679ab9e119d11#diff-a1639ee623bffb002ce1b1d3d18893f1d3ca6460a15d030cd272281f3126a7be

It just doesn't make sense though, there's no purpose having support for both a wildcard and non-wildcard when the wildcard unconditionally clobbers the non-wildcards.

It should either:

  • use the resource-kind-specific config if found, and fall back to the wildcard if not (recommended); or
  • append the wildcard config to the resource-specific config and do a unique sort

The former option is preferred because it allows the config author to say "add these labels to all resource kinds, except for this specific kind where I want to leave some of them out".

Environment:

  • kube-state-metrics version: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
  • Kubernetes version (use kubectl version): server v1.29.2
  • Cloud provider or hardware configuration: Repro'd on kind, but seen on CSP-managed k8s too
  • Other info:
### Tasks

ringerc avatar Aug 29 '24 00:08 ringerc

/assign @rexagod /triage accepted

dashpole avatar Sep 05 '24 16:09 dashpole

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Sep 05 '25 17:09 k8s-triage-robot

I worked around this by only using * for all my label enrichment. It's unfortunate but tolerably effective.

ringerc avatar Sep 09 '25 04:09 ringerc