loki
loki copied to clipboard
Promtail: `kubernetes_sd_configs` keeps tailing all Completed pods forever, causing high CPU load
Describe the bug
Using Promtail with kubernetes_sd_configs
to monitor pods.
Cluster frequently starts pods that run to completion (sortof like batch-jobs, not persistent servers).
Promtail automatically adds targets for newly started pods.
When pods are completed, the targets remain inside promtail.
Promtail starts using more and more CPU over time.
To Reproduce Steps to reproduce the behavior:
- Started Promtail 2.7.3, using config attached below
- Start a lot of Pods in the k8s cluster and let them run to State=Completed
- After around 350 completed pods, the cpu usage of promtail is stuck using one full core on a 8 core machine (!)
- Observe Promtail
/metrics
endpoint, it will keep showingpromtail_targets_active_total 350
, even though all the pods are completed since long ago.
Expected behavior After a pod has been completed. Promtail should read all remaining log content and then remove it from its watched targets. CPU usage should not increase when there are no Running pods on the node.
Screenshots, Promtail config, or terminal output Promtailconfig.yaml:
scrape_configs:
- job_name: mypods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ "__meta_kubernetes_pod_node_name" ]
target_label: "__host__" # Promtail automatically excludes pods on other hosts
- action: keep
source_labels: [ "__meta_kubernetes_namespace" ]
regex: 'default'
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
replacement: /var/log/pods/*$1/*.log
regex: true/(.*)
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
We're seeing the same thing, and were curious if something like below would work:
....
- action: drop
regex: Succeeded|Failed|Completed
source_labels:
- __meta_kubernetes_pod_phase
From what I understand of Kubernetes service discovery, Kubernetes continues to list completed pods in its API until its garbage collection limits are reached, and presumably this is the default behavior so that people can have insight in all sorts of pod statuses according to their use case.