fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Memory Leak running Fluent-bit v3.2.3

Open davidtilloy opened this issue 1 year ago • 1 comments

Bug Report

Describe the bug

Running fluent-bit v3.2.3 is consuming all the system memory, ending with a crash of the system at one point (when no more memory).

To Reproduce

  • Set up your system with this configuration:
[SERVICE]
  flush            15
  daemon           Off
  log_level        info
  parsers_file     parsers.conf
  parsers_file     parsers-avx.conf
  plugins_file     plugins.conf
  http_server      Off
  http_listen      127.0.0.1
  http_port        2020
  storage.metrics  on

[INPUT]
  Name               systemd
  Tag                node.systemd.*
  Path               /var/log/journal
  DB                 /var/log/flb_systemd.db
  Mem_Buf_Limit      32MB
  Read_From_Tail     On
  Strip_Underscores  On
  Lowercase          On

[INPUT]
  Name              tail
  Tag               nginx.access
  Parser            nginx-avx
  Path              /var/log/nginx/access.log
  DB                /var/log/flb_nginx_access.db
  Mem_Buf_Limit     5MB
  Refresh_Interval  10

[INPUT]
  Name              tail
  Tag               nginx.error
  Parser            nginx_errorlog
  Path              /var/log/nginx/error.log
  DB                /var/log/flb_nginx_error.db
  Mem_Buf_Limit     5MB
  Refresh_Interval  10

[FILTER]
  Name       record_modifier
  Alias      remove_systemd_keys
  Match      node.systemd.*
  Remove_Key cursor
  Remove_Key boot_id
  Remove_Key code_line
  Remove_Key stream_id
  Remove_Key machine_id
  Remove_Key message_id
  Remove_Key realtime_timestamp
  Remove_Key monotonic_timestamp
  Remove_Key source_realtime_timestamp
  Record     log_origin systemd

[FILTER]
  Name    record_modifier
  Alias   filter_product
  Match   *
  Record  node ${HOSTNAME}
  Record  product nginx-s3

[FILTER]
  Name         modify
  Alias        final_format
  Match        *
  Rename       error      short_message
  Rename       log        short_message
  Rename       message    short_message
  Rename       priority   severity
  Add          log_origin nginx
  Add          severity   info
  Remove_regex (process|thread)_id

@INCLUDE output.conf
  • Then run fluent-bit with fluent-bit -c fluent-bit.conf
  • Wait for a couple of hours (in my case, 1GB is used per hour)
  • Then you'll see:
$ ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
root        6626  0.1 82.4 130767788 6603148 ?   Ssl  03:26   0:47 /opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf

Reverting to version v3.2.2 (same configuration, same OS) fix this issue (~7pm on this graph):

image

Expected behavior

Memory usage should be stable, fluent-bit should not use all the memory.

Your Environment

  • Version used: v3.2.3
  • Configuration: See the details in the 'how to reproduce' section
  • Environment name and version (e.g. Kubernetes? What version?): 6.8.0-1019-aws
  • Server type and version: EC2 Linux/AWS
  • Operating System and version: Ubuntu 22.04 / AWS AMI ami-048f97d041d14fd4e
  • Filters and plugins: filters: modify and record_modifier

davidtilloy avatar Dec 26 '24 11:12 davidtilloy

Hello 👋 Agree, there is some real trickery going on here that I had a lot of help with, so it would be great for us to document it with a big block comment.

nicbarker avatar Dec 24 '24 20:12 nicbarker

Courtesy of some great work from @FintasticMan https://github.com/nicbarker/clay/pull/119

image

nicbarker avatar Jan 02 '25 21:01 nicbarker

Explanation of the remaining (much smaller) CLAY macro has been added in this commit: https://github.com/nicbarker/clay/commit/a44423a1333290feb86b49d9f960fe8e41104ae1

nicbarker avatar Jan 02 '25 21:01 nicbarker

Nice!

VisenDev avatar Jan 02 '25 21:01 VisenDev