kube-state-metrics
kube-state-metrics copied to clipboard
labels_allow_list and annotations_allow_list wildcards clobber resource specific configuration
What happened:
When configuring kube-state-metrics to emit kube_{resourcekindplural}_labels info-metrics, I found that a * entry in the label allow map clobbers all configuration for all other resource types.
Expressed in kube-state-metrics yaml config format rather than CLI format for readability, for example, adding the wildcard in the below:
labels_allow_list:
"*":
- some_custom_label
nodes:
# AWS EKS node pool
- eks.amazonaws.com/nodegroup
# Azure AKS node pool
- agentpool
# google GKE node pool
- cloud.google.com/gke-nodepool
will cause kube_node_labels to have only label_some_custom_label.
The addition of the wildcard removes the previously present label_eks_amazonaws_com_nodegroup, label_agentpool and label_cloud_google_com_gke_nodepool.
So the net result of this configuration is equivalent to:
labels_allow_list:
nodes:
- some_custom_label
What you expected to happen:
I expected the resource-specific configuration to either append to or override the wildcard configuration, so the net effective configuration would be either:
labels_allow_list:
nodes:
# labels from "*"
- some_custom_label
# labels from "nodes"
- eks.amazonaws.com/nodegroup
- agentpool
- cloud.google.com/gke-nodepool
otherresource:
# labels from "*"
- some_custom_label
# ...
or (if node-specific overrides * rather than appending:
labels_allow_list:
nodes:
# labels from "nodes"
- eks.amazonaws.com/nodegroup
- agentpool
- cloud.google.com/gke-nodepool
otherresource:
# labels from "*"
- some_custom_label
# ...
How to reproduce it (as minimally and precisely as possible):
Run kube-state-metrics with CLI arguments for wildcard added:
"--metric-labels-allowlist=nodes=[kubernetes.io/arch],*=[somenonexistentlabel]"
and query its metrics endpoint over port-forward (or use prometheus) e.g.
curl -sSLf1 http://127.0.0.1:8080/metrics |grep ^kube_node_labels
you will note that the node labels do not contain label_kubernetes_io_arch.
Now re-launch kube-state-metrics, but this time with CLI arguments omitting the wildcard:
"--metric-labels-allowlist=nodes=[kubernetes.io/arch]"
If you query the metrics endpoint, the label_kubernetes_io_arch label will appear on the metrics.
Anything else we need to know?:
This looks like it was intentional, per https://github.com/kubernetes/kube-state-metrics/blame/c864c93606db61e1c424b9313da03522f9f11adb/internal/store/builder.go#L235-L239
It was added in https://github.com/kubernetes/kube-state-metrics/commit/0b76e7d4f484e8eb7f71f06c66b679ab9e119d11#diff-a1639ee623bffb002ce1b1d3d18893f1d3ca6460a15d030cd272281f3126a7be
It just doesn't make sense though, there's no purpose having support for both a wildcard and non-wildcard when the wildcard unconditionally clobbers the non-wildcards.
It should either:
- use the resource-kind-specific config if found, and fall back to the wildcard if not (recommended); or
- append the wildcard config to the resource-specific config and do a unique sort
The former option is preferred because it allows the config author to say "add these labels to all resource kinds, except for this specific kind where I want to leave some of them out".
Environment:
- kube-state-metrics version:
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0 - Kubernetes version (use
kubectl version): server v1.29.2 - Cloud provider or hardware configuration: Repro'd on
kind, but seen on CSP-managed k8s too - Other info:
### Tasks
/assign @rexagod /triage accepted
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
I worked around this by only using * for all my label enrichment. It's unfortunate but tolerably effective.