fluentd
fluentd copied to clipboard
in_exec with out_exec in one config create file descriptors leak
Describe the bug
Hello, I think I have found the core issue of my problems with fluentd configuration, could someone have look on this please? I have set everything according to https://docs.fluentd.org/installation/before-install. Using configuration and two scripts below I can notice that if I do these for a while:
cd /proc/<fluentd PID>/
find fd | wc -l
Then I can see that number of file descriptors is rising infinitely eventually resulting in memory problems and "too many open files" error because of reaching fd limit (ulimit -n). This only occures if both in_exec and out_exec are used, since if I do following swaps, the number of file descriptors stabilizes. I can do either of these and the problems will disappear.
File descriptors count is stabilized if I swap match section for this:
<match hardware.memory>
@type file
append true
path {{ home }}/fluentd/test_bug2.log
add_path_suffix false
<buffer []>
@type memory
flush_mode immediate
</buffer>
<format>
@type json
</format>
</match>
or swap source and filter section for this:
<source>
@type sample
sample {"mem_available":"1.5","swap_total":"1.5","swap_free":"1.2","time_local":"2022-02-09T10:00:00+0100"}
tag hardware.memory
</source>
To Reproduce
Install fluentd:
gem install fluentd --version 1.14.5 --no-doc
Create folder "fluentd" in home folder
Exchange {{ home }} or {{ user }} for your real values in every file below.
Put fluentd.conf (with configuration below), get_memory_metrics.sh, psql_memory.sh into {{ home }}/fluentd/
do chmod +x on get_memory_metrics.sh, psql_memory.sh
run {{ home }}/bin/fluentd 1.14.5 -c {{ home }}/fluentd/fluentd.conf -vv
Now start checking the file descriptors in /proc/<fluentd PID>/
with:
find fd | wc -l
Keep cheking it for some time, youll see the count of file descriptors never stops rising.
Hopefully I did not miss anything.
Expected behavior
Number of file descriptors is not rising infinitely and not causing memory leaks.
Your Environment
- Fluentd version: 1.14.5 (installed as gem)
- TD Agent version: -
- Operating system: Rocky Linux 8
- Kernel version: 4.18.0-348.12.2.el8_5.x86_64
- Ruby version: 2.5.9
Your Configuration
<system>
process_name fluentd_monitor
</system>
<source>
@type exec
tag hardware.memory
run_interval 10s
command {{ home }}/fluentd/get_memory_metrics.sh
<parse>
@type json
types mem_available:float,swap_total:float,swap_free:float
</parse>
</source>
<filter hardware.memory>
@type record_transformer
enable_ruby
<record>
time_local ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
</record>
</filter>
<match hardware.memory>
@type exec
command {{ home }}/fluentd/psql_memory.sh first_argument second_argument
<format>
@type csv
force_quotes false
delimiter |
fields time_local,mem_available,swap_total,swap_free
</format>
<buffer time>
@type file
path {{ home }}/fluentd/hardware.memory.buff
timekey 50s
timekey_wait 0s
flush_mode lazy
total_limit_size 200MB
retry_type periodic # exponential backoff is broken https://github.com/fluent/fluentd/issues/3609
retry_max_times 0 # need to go to secondary as soon as possible https://github.com/fluent/fluentd/issues/3610
</buffer>
<secondary>
@type secondary_file
directory {{ home }}/fluentd/secondary/
basename dump.${chunk_id}
</secondary>
</match>
Your Error Log
No errors in fluentd.log using --v
Additional context
get_memory_metrics.sh
#!/bin/bash
mem_info=$(awk '{ \
if (/MemAvailable:/) {mem_available=$2}; \
if (/SwapTotal:/) {swap_total=$2}; \
if (/SwapFree:/) {swap_free=$2}; \
if (mem_available && swap_total && swap_free) \
{printf "%.3f %.3f %.3f", mem_available/1024/1024, swap_total/1024/1024, swap_free/1024/1024;exit} \
}' /proc/meminfo)
read -r mem_available swap_total swap_free<<<"${mem_info}"
printf '{"mem_available":"%s","swap_total":"%s","swap_free":"%s"}' \
"$mem_available" \
"$swap_total" \
"$swap_free"
psql_memory.sh
#!/bin/sh
echo $1 $2 $3
It seems that it is reproducible with a more simple in_exec example. it may also reproducible for out_exec side.
<system>
process_name fluentd_monitor
</system>
<source>
@type exec
tag hardware.memory
run_interval 10s
command PATH_TO_COMMAND
<parse>
@type json
types mem_available:float,swap_total:float,swap_free:float
</parse>
</source>
<match hardware.memory>
@type stdout
</match>
I fixed get_memory_metrics.sh in my example which did not work, prinf part was missing. I am gonna try your example, but I was not able to reproduce this when having only in_exec or out_exec in config, both had to be present.
Yea, I can confirm that your example causes this too. So its not the combination of both that causes this.
EDIT: wrong, it was too soon to judge
I am sorry, it was too soon to judge, it has stabilized on around 174 of FDs. So my first thought still applies, I observe only the combination of both in_exec and out_exec to cause infinite rising of FDs.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Bump.
I checked it but can't reproduce. I modified your config a little because they doesn't work properly for me:
fluent.conf
<system>
process_name fluentd_monitor
</system>
<source>
@type exec
tag hardware.memory
run_interval 10s
command /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/get_memory_metrics.sh
<parse>
@type json
types mem_available:float,swap_total:float,swap_free:float
</parse>
</source>
<filter hardware.memory>
@type record_transformer
enable_ruby
<record>
time_local ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
</record>
</filter>
<match hardware.memory>
@type exec
command /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/psql_memory.sh
<format>
@type csv
force_quotes false
delimiter |
fields time_local,mem_available,swap_total,swap_free
</format>
<buffer time>
@type file
path /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/hardware.memory.buff
timekey 50s
timekey_wait 0s
flush_mode lazy
total_limit_size 200MB
retry_type periodic # exponential backoff is broken https://github.com/fluent/fluentd/issues/3609
retry_max_times 0 # need to go to secondary as soon as possible https://github.com/fluent/fluentd/issues/3610
</buffer>
</match>
get_memory_metrics.sh:
#!/bin/bash
mem_info=$(awk '{ \
if (/MemAvailable:/) {mem_available=$2}; \
if (/SwapTotal:/) {swap_total=$2}; \
if (/SwapFree:/) {swap_free=$2}; \
if (mem_available && swap_total && swap_free) \
{printf "%.3f %.3f %.3f", mem_available/1024/1024, swap_total/1024/1024, swap_free/1024/1024;exit} \
}' /proc/meminfo)
read -r mem_available swap_total swap_free<<<"${mem_info}"
printf '{"mem_available":"%s","swap_total":"%s","swap_free":"%s"}' \
"$mem_available" \
"$swap_total" \
"$swap_free"
(<<<
doesn't work for me so I modified the shebang to use bash explicitly.)
psql_memory.sh:
#!/bin/sh
cat $1 >> /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/memusage.txt
- The argument of the command is a path of buffer chunk file.
- Dump the contents to a file to confirm the command is called expectedly.
Environment:
- Ubuntu 22.04
- Ruby 3.0.2 & 2.7.5
- Fluentd 1.15.1
Results:
$ sudo find /proc/2591/fd | wc -l
13
$ cat /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/memusage.txt
2022-07-28T21:43:28+0900|22.107|15.999|15.983
2022-07-28T21:43:38+0900|22.107|15.999|15.983
2022-07-28T21:43:48+0900|22.107|15.999|15.983
...
Count of fd is always 13
It might depend on installed gems (low possibility though), please try updating them.
(<<< doesn't work for me so I modified the shebang to use bash explicitly.)
I dont see any difference in script you posted? There is bash shebang as well in my version of the script.
Only difference I see is that you are missing secondary section. Could you try that with that? The {{ home }} was meant to be rewritten according to you environment.
I will try again later and post if I still observe the problem.
Also did you try sudo find /proc/2591/fd | wc -l
in like 15 minutes after fluentd was started or only at the begging?
Only difference I see is that you are missing secondary section. Could you try that with that?
I also tried secondary
but got almost same result.
Also did you try sudo find /proc/2591/fd | wc -l in like 15 minutes after fluentd was started or only at the begging?
I tried it every 1 sec for 2 hours.
I still see the problem even with version 1.15.1. Roughly after 10 minutes there is 90 in my case, after 20 minutes 160. 30 minutes 233 and so on...
- install vagrant https://www.vagrantup.com/downloads
- create VagrantFile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box = "rockylinux/8"
config.vm.box_check_update = true
config.vm.hostname = "rocky8"
config.vm.network "private_network", ip: "192.168.56.121"
config.vm.define "rocky8"
config.vm.provider "virtualbox" do |vb|
vb.name = "rocky8"
vb.gui = false
vb.memory = 1024
vb.cpus = 1
vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 1000 ]
end
config.vm.network :forwarded_port, guest: 22, host: 4444, id: 'ssh'
config.ssh.extra_args = ["-L", "8000:localhost:8000"]
end
- vagrant up
- vagrant ssh
- sudo dnf -y module enable ruby:3.0
- sudo dnf install gcc make cmake ruby ruby-devel
- gem install fluentd --version 1.15.1 --no-doc
- cd ~
- mkdir fluentd
- cd fluentd
- create fluentd.conf, get_memory_metric.sh, psql_memory.sh from my first post above
- swich {{ home }} for /home/vagrant in all files from my first post above
- chmod +x on both .sh files
- cd ~/fluentd
- ~/bin/fluentd -c fluentd.conf
- start another shell (vagrant ssh from vagrant folder again)
- ps aux | grep fluentd
- take PID of worker:fluentd_monitor
- cd /proc/<PID>
- find fd | wc -l
- observe forever rising number of fds
I use vagrant in this example, but I have seen this in rented KVM cloud.
Thanks, it reproduced on Rocky Linux 8:
[vagrant@localhost ~]$ ls -l /proc/5844/fd/
total 0
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 0 -> /dev/pts/1
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 1 -> /dev/pts/1
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 10 -> 'pipe:[58579]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 12 -> 'pipe:[58126]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 13 -> 'pipe:[58620]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 14 -> 'pipe:[58589]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 15 -> 'pipe:[58600]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 16 -> 'pipe:[58608]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 17 -> 'pipe:[58273]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 18 -> 'pipe:[58630]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 19 -> 'pipe:[58184]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 2 -> /dev/pts/1
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 20 -> 'pipe:[58642]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 21 -> 'pipe:[58654]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 22 -> 'pipe:[58674]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 23 -> 'pipe:[58336]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 24 -> 'pipe:[58697]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 25 -> 'pipe:[58679]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 26 -> 'pipe:[58297]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 27 -> 'pipe:[58729]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 28 -> 'pipe:[58739]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 29 -> 'pipe:[58817]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 3 -> 'anon_inode:[eventfd]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 30 -> 'pipe:[58346]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 31 -> 'pipe:[58322]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 32 -> 'pipe:[58776]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 33 -> 'pipe:[58784]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 34 -> 'pipe:[58793]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 36 -> 'pipe:[58826]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 37 -> 'pipe:[58835]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 38 -> 'pipe:[58802]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 39 -> 'pipe:[58846]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 4 -> 'anon_inode:[eventfd]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 40 -> 'pipe:[58852]'
l-wx------. 1 vagrant vagrant 64 Jul 29 12:23 41 -> /vagrant/3627-exec-memleak/hardware.memory.buff/buffer.b5e4f0b73baa41208d74aab41afc22048.log.meta
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 43 -> 'pipe:[58861]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 5 -> 'anon_inode:[eventpoll]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 6 -> 'anon_inode:[eventpoll]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 8 -> /vagrant/3627-exec-memleak/hardware.memory.buff/buffer.b5e4f0b73baa41208d74aab41afc22048.log
l-wx------. 1 vagrant vagrant 64 Jul 29 12:20 9 -> 'pipe:[58532]'
With #3844 file descriptor leak seems fixed, but still following objects are linearly incremented:
Although file descriptor leak is resolved, not yet resolved the problem described above. Reopen this.
Hi, any luck on this one?
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days
I think this is still not solved completely...
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days
I think this is still not solved completely...