loki Promtail: `kubernetes_sd_configs` keeps tailing all Completed pods forever, causing high CPU load

Promtail: `kubernetes_sd_configs` keeps tailing all Completed pods forever, causing high CPU load

Open hterik opened this issue 1 year ago • 2 comments

Describe the bug Using Promtail with kubernetes_sd_configs to monitor pods. Cluster frequently starts pods that run to completion (sortof like batch-jobs, not persistent servers). Promtail automatically adds targets for newly started pods. When pods are completed, the targets remain inside promtail. Promtail starts using more and more CPU over time.

To Reproduce Steps to reproduce the behavior:

Started Promtail 2.7.3, using config attached below
Start a lot of Pods in the k8s cluster and let them run to State=Completed
After around 350 completed pods, the cpu usage of promtail is stuck using one full core on a 8 core machine (!)
Observe Promtail /metrics endpoint, it will keep showing promtail_targets_active_total 350, even though all the pods are completed since long ago.

Expected behavior After a pod has been completed. Promtail should read all remaining log content and then remove it from its watched targets. CPU usage should not increase when there are no Running pods on the node.

Screenshots, Promtail config, or terminal output Promtailconfig.yaml:

scrape_configs:
  - job_name: mypods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [ "__meta_kubernetes_pod_node_name" ]
        target_label: "__host__"     # Promtail automatically excludes pods on other hosts
      - action: keep
        source_labels: [ "__meta_kubernetes_namespace" ]
        regex: 'default'
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
          - __meta_kubernetes_pod_uid
          - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        regex: true/(.*)
        separator: /
        source_labels:
          - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
          - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
          - __meta_kubernetes_pod_container_name
        target_label: __path__

Jan 24 '24 10:01 hterik

We're seeing the same thing, and were curious if something like below would work:

....
      - action: drop
        regex: Succeeded|Failed|Completed
        source_labels:
          - __meta_kubernetes_pod_phase

Mar 12 '24 15:03 blaketastic2

From what I understand of Kubernetes service discovery, Kubernetes continues to list completed pods in its API until its garbage collection limits are reached, and presumably this is the default behavior so that people can have insight in all sorts of pod statuses according to their use case.

May 09 '24 00:05 rgroothuijsen

loki loki copied to clipboard

Promtail: `kubernetes_sd_configs` keeps tailing all Completed pods forever, causing high CPU load

loki
loki copied to clipboard