kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

kube_endpoint_address duplicates with Prometheus 2.52

Open gdlx opened this issue 8 months ago • 13 comments

After upgrading to Prometheus 2.52, we had some alerts about dropped duplicates samples.

  • The prometheus log shown the following warning:

     scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
    
  • Setting the log level to debug shown the concerned series:

    scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Duplicate sample for timestamp" series="kube_endpoint_address{namespace=\"monitoring\",endpoint=\"prometheus-operated\",ip=\"100.91.68.8\",ready=\"true\"}"
    
  • Checking the indicated series indeed shown the following duplicates:

    kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1
    kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1
    
  • The prometheus-operated endpoint has the following subsets:

    subsets:
      - addresses:
          - ip: 100.91.43.113
            hostname: prometheus-kube-prometheus-istio-0
            nodeName: ip-100-91-48-253.eu-west-3.compute.internal
            targetRef:
              kind: Pod
              namespace: monitoring
              name: prometheus-kube-prometheus-istio-0
              uid: 1180e2a5-75e4-4098-961c-940264115438
          - ip: 100.91.68.8
            hostname: prometheus-kube-prometheus-stack-prometheus-0
            nodeName: ip-100-91-212-113.eu-west-3.compute.internal
            targetRef:
              kind: Pod
              namespace: monitoring
              name: prometheus-kube-prometheus-stack-prometheus-0
              uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
        ports:
          - name: http-web
            port: 9090
            protocol: TCP
      - addresses:
          - ip: 100.91.68.8
            hostname: prometheus-kube-prometheus-stack-prometheus-0
            nodeName: ip-100-91-212-113.eu-west-3.compute.internal
            targetRef:
              kind: Pod
              namespace: monitoring
              name: prometheus-kube-prometheus-stack-prometheus-0
              uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
        ports:
          - name: grpc
            port: 10901
            protocol: TCP
    

We can see the 2 entries on the same IP (100.91.68.8) but on different ports. Grpc is enabled only by the Thanos sidecar container, and it's enabled only on one Prometheus instance. I think there wouldn't have been duplicates if both instances had the same config (there would only have been one subset with both addresses and ports).

The only way I see to fix this would be to add a port label on the kube_endpoint_address metric. Is there something else I can do or would this be considered as a bug ?

Thanks !

Environment:

  • kube-state-metrics version: 2.12.0
  • Kubernetes version: 1.28
  • Cloud provider or hardware configuration: AWS EKS
  • Other info:

gdlx avatar May 31 '24 13:05 gdlx