aws-load-balancer-controller AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint

Describe the bug After upgrading version of aws load balancer controller using helm, i keep seeing rest_client_request_latency_seconds histogram metric exposed on the Prometheus metrics endpoint. It includes a URL tag containing the URI of all API versions. It's about ~900 metrics. I've delete chart, check dependencies and redeploy. But the problem didn't go away https://github.com/kubernetes-sigs/controller-runtime/issues/1423 https://github.com/kubernetes-sigs/controller-runtime/pull/1587

...
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.004"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.008"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.016"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.032"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.064"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.128"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.256"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.512"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="+Inf"} 1
rest_client_request_latency_seconds_sum{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 0.010152667
rest_client_request_latency_seconds_count{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.004"} 127
...

Steps to reproduce:

Deploy the aws-load-balancer-controller using the Helm Chart with the ServiceMonitor disabled (serviceMonitor.enabled=false Chart value). Get metrics from the exposed Prometheus endpoint (Chart default, :8080/metrics).

Expected outcome:

The rest_client_request_latency_seconds metric either not being present at in the exposed metrics.

Environment:

AWS Load Balancer controller: v2.4.5
Chart version: 1.4.6
EKS: 1.21.14-eks-fb459a0

Additional Context: Here my chart values file, other values by default.

replicaCount: 2

image:
  repository: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller
  tag: v2.4.5
  pullPolicy: IfNotPresent

clusterName: main-eks-qa

fullnameOverride: aws-load-balancer-controller

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::############:role/aws-load-balancer-controller

podLabels:
  ######.####/instance: aws-load-balancer-controller

webhookTLS:
  caCert:
  cert:
  key:

disableIngressClassAnnotation: true

disableIngressGroupNameAnnotation: true

podDisruptionBudget:
  maxUnavailable: 1

serviceMonitor:
  enabled: false
  additionalLabels: {}
  interval: 1m

clusterSecretsPermissions:
  allowAllSecrets: false

Nov 25 '22 15:11 yodaflomaster

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 07 '23 23:03 k8s-triage-robot

/remove-lifecycle stale

Apr 04 '23 10:04 yodaflomaster

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 03 '23 11:07 k8s-triage-robot

/remove-lifecycle stale

Jul 03 '23 11:07 yodaflomaster

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 23 '24 18:01 k8s-triage-robot

/remove-lifecycle stale

Feb 06 '24 17:02 yodaflomaster

I've created #3645 which will allow the dropping of metrics; you'll be able to do the following in the Helm values to remove the rest client metrics once it's merged.

serviceMonitor:
  enabled: true
  metricRelabelings:
    - sourceLabels: ["__name__"]
      regex: ^rest_client_.+
      action: drop

Apr 11 '24 16:04 stevehipwell

Delivered in v2.8.0

May 20 '24 17:05 shraddhabang

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard