fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Kubernetes memory leak with tail input plugin, http and es output plugins

Open VlN9 opened this issue 2 months ago • 9 comments

Memory leak withTail input plugin, HTTP and ElasticSearch output plugins

Description

Fluent Bit (4.0.7 via Helm chart 0.53.0) exhibits continuous RAM growth. Memory consumption never stabilizes and eventually OOMs. There is a high load Kubernetes cluster with high amount of logs.

Configuration

service:
  daemon: off
  flush: 1s
  log_level: info
  parsers_file: /fluent-bit/etc/parsers.conf
  storage.path: /var/log/fb-storage/
  storage.metrics: on
  storage.checksum: on
  storage.sync: normal
  storage.max_chunks_up: 64
  storage.backlog.mem_limit: 200M
  storage.delete_irrecoverable_chunks: on
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  health_check: on
  refresh_interval: 5s
parsers:
  - name: custom-tag
    format: regex
    regex: '^(?<namespace_name>[^\.]+)\.(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)\.(?<container_name>.+)'
  - name: nginx-ingress
    format: regex
    regex: '^(?<clientip>[^ ]*) - (?<client_identity>[^ ]*) \[(?<timestamp>[^\]]*)\] "(?<verb>\S+)(?: +(?<request>[^\"]*?)(?: +(?<httpversion>\S+))?)?" (?<response>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referrer>[^\"]*)" "(?<user_agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^\]]*)\] \[(?<proxy_alternative_upstream_name>[^\]]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<req_id>[^ ]*)$'
    time_key: timestamp
    time_format: "%d/%b/%Y:%H:%M:%S %z"
multiline_parsers:
  - name: multiline_json
    type: regex
    flush_timeout: 2000
    key_content: log
    rules:
      - state: start_state
        regex: '^\{.*$'
        next_state: cont
      - state: cont
        regex: '^\s+.*$'
        next_state: cont
      - state: cont
        regex: '^\}$'
        next_state: start_state
pipeline:
  inputs:
    - name: tail
      storage.type: filesystem
      path: /var/log/containers/*.log
      db: /var/log/fb-storage/flb.db
      multiline.parser: docker, cri
      tag_regex: '(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$'
      tag: kube.<namespace_name>.<pod_name>.<container_name>
      key: log
      mem_buf_limit: 150M
      buffer_chunk_size: 50M
      buffer_max_size: 150M
      refresh_interval: 60
      ignore_older: 1d
      skip_empty_lines: on
  filters:
    - name: kubernetes
      regex_parser: custom-tag
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      kube_tag_prefix: kube.
      merge_log: off
      keep_log: on
      buffer_size: 1M
      k8s-logging.parser: on
      k8s-logging.exclude: off
    - name: grep
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      exclude: log ^$
    - name: multiline
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      multiline.parser: multiline_json
      emitter_storage.type: memory
      emitter_mem_buf_limit: 100M
    - name: modify
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      add: kubernetes_cluster rke2-dc8
      alias: add_cluster_label
    - name: nest
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      operation: lift
      nested_under: kubernetes
      add_prefix: kubernetes_
    - name: modify
      match_regex: '^kube.[a-z0-9-]+\.ingres[a-z0-9-]+\.[a-z-]+$'
      copy: log nginx_parsed_log
    - name: parser
      match_regex: '^kube.[a-z0-9-]+\.ingres[a-z0-9-]+\.[a-z-]+$'
      key_name: nginx_parsed_log
      parser: nginx-ingress
      reserve_Data: On
    - name: rewrite_tag
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      rule: 
        - $kubernetes_labels['fluentbit'] ^(.+)$ $TAG.elastic true
        - kubernetes_pod_name ingres-nginx-external $TAG.elastic true
      emitter_name: re_emitted
      emitter_storage.type: memory
      emitter_mem_buf_limit: 100M
    - name: grep
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.vault.+\.elastic'
      exclude: kubernetes_container_name vault.*
    - name: modify
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+\.elastic'
      rename: log message
    - name: lua
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+\.elastic'
      script: /fluent-bit/scripts/add_field.lua
      call: add_field
  outputs:
    - name: http
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      host: 10.20.0.54
      port: 9428
      compress: gzip
      uri: /insert/jsonline?_stream_fields=stream,kubernetes_pod_name,kubernetes_container_name,kubernetes_namespace_name&_msg_field=log&_time_field=date
      format: json_lines
      json_date_format: iso8601
      header:
        - AccountID 0
        - ProjectID 0
      retry_Limit: 3
      storage.total_limit_size: 1GB
    - name: es
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+.elastic'
      host: ${FB_ELASTIC_HOST}
      port: 9200
      hTTP_User: ${FB_ELASTIC_USER}
      hTTP_Passwd: ${FB_ELASTIC_PASSWORD}
      tls: on
      tls.verify: off
      type: _doc
      logstash_Prefix: fb-logs
      logstash_Format: On
      suppress_Type_Name: On
      replace_Dots: On
      generate_id: On
      retry_Limit: 2
      storage.total_limit_size: 1GB
      trace_error: On

Memory Usage

Image

Steps to Reproduce

  1. Deploy Fluent Bit with the configuration above.
  2. Observe memory usage over 48 hours.
  3. Memory grows linearly without a plateau.

Expected Result

RAM usage stabilizes after initial allocation.

Actual Result

RAM usage increases continuously until OOM.

Questions

  • Is there a workaround to prevent memory leak in the tail input?
  • Maybe i'm doing something wrong?

VlN9 avatar Oct 03 '25 13:10 VlN9

Is it reproducible on the latest versions of 4.0 or 4.1 series?

patrick-stephens avatar Oct 03 '25 15:10 patrick-stephens

@patrick-stephens, it is reproducible on versions 3.2.* and 4.0.7. I didn't try on the latest versions, because 4.0.7 is the latest in the official Helm chart of Fluent Bit Here you can find the official Helm chart https://artifacthub.io/packages/helm/fluent/fluent-bit/0.53.0

VlN9 avatar Oct 06 '25 06:10 VlN9

I've tried to update app version to 4.1.0, memory leak still present

VlN9 avatar Oct 07 '25 08:10 VlN9

I'm just wondering why the twice of applying multiline is needed in your conf:

pipeline:
  inputs:
    - name: tail
      storage.type: filesystem
      path: /var/log/containers/*.log
      db: /var/log/fb-storage/flb.db
      multiline.parser: docker, cri
      tag_regex: '(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$'
      tag: kube.<namespace_name>.<pod_name>.<container_name>
      key: log
      mem_buf_limit: 150M
      buffer_chunk_size: 50M
      buffer_max_size: 150M
      refresh_interval: 60
      ignore_older: 1d
      skip_empty_lines: on
  filters:
    - name: kubernetes
      regex_parser: custom-tag
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      kube_tag_prefix: kube.
      merge_log: off
      keep_log: on
      buffer_size: 1M
      k8s-logging.parser: on
      k8s-logging.exclude: off
    - name: grep
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      exclude: log ^$
    - name: multiline
      match_regex: '^kube.[a-z0-9-]+\.[a-z0-9-]+\.[a-z-]+$'
      multiline.parser: multiline_json
      emitter_storage.type: memory
      emitter_mem_buf_limit: 100M

This could cause piling up the intermediate status of buffers. So, it could cause high memory usage.

cosmo0920 avatar Oct 21 '25 07:10 cosmo0920

@cosmo0920 Hi there. Thank you for the reply The first multiline (docker, cri) removes timestamps and "stdout F" symbols at the front of each log (e.g 2025-10-21T10:16:08.195218531Z stdout F ) I'll try to remove it in another way and come back with the result tomorrow.

VlN9 avatar Oct 21 '25 10:10 VlN9

try remove ignore_older.

zdyj3170101136 avatar Oct 22 '25 03:10 zdyj3170101136

Is the issue resolved? I met the memory leak issue in 4.1.1. The memory usage of the pod in EKS cluster increase until OOM.

aq2013 avatar Dec 01 '25 09:12 aq2013

Same here.

hoebelix avatar Dec 02 '25 11:12 hoebelix

I encountered the same issue in EKS clusters,
The aws-for-fluent-bit version 3.0.0, which is built based on Fluent Bit v4.1.1, is encountering the same OOM (Out-of-Memory) issue.

normalzzz avatar Dec 04 '25 03:12 normalzzz