fluentd
fluentd copied to clipboard
File watchers might not be handled properly causing gradual increase in CPU/Memory usage
Describe the bug
Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd
. After coming across https://github.com/fluent/fluentd/issues/3614, we implemented the workaround suggested there.
- changed
follow_inodes
totrue
- set
rotate_wait
to0
Since than we are not seeing the original If you keep getting this message, please restart Fluentd
but still seeing lots of Skip update_watcher because watcher has been already updated by other inotify event
.
This is paired with a pattern of memory leaking and gradual increase in CPU usage until a restart occurs.
To mitigate this I added pos_file_compaction_interval 20m
as suggested here but this had no affect on the resource usage.
Related to https://github.com/fluent/fluentd/issues/3614. More specifically https://github.com/fluent/fluentd/issues/3614#issuecomment-1871484810
The suspicion is that some Watchers are not handled properly thus leaking and increasing CPU/Memory consumption until the next restart.
To Reproduce
Deploy fluentd (version v1.16.3-debian-forward-1.0) as a daemonset in a dynamic kubernetes cluster. Cluster is consisting of 50-100 nodes. This is the fluentd config:
Expected behavior
CPU / Memory should stay stable.
Your Environment
- Fluentd version: [v1.16.3-debian-forward-1.0](https://github.com/fluent/fluentd-kubernetes-daemonset#:~:text=debian%2Dcloudwatch%2D1-,Forward,-docker%20pull%20fluent)
Your Configuration
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
follow_inodes true
rotate_wait 0
exclude_path ["/var/log/containers/fluentd*.log", "/var/log/containers/*kube-system*.log", "/var/log/containers/*calico-system*.log", "/var/log/containers/prometheus-node-exporter*.log", "/var/log/containers/opentelemetry-agent*.log"]
pos_file_compaction_interval 20m
<parse>
@type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key true
</pattern>
<pattern>
format /^(?<time>.+?) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/
time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
</pattern>
</parse>
emit_unmatched_lines true
</source>
### Your Error Log
```shell
Skip update_watcher because watcher has been already updated by other inotify event
Additional context
https://github.com/fluent/fluentd/issues/3614
Thanks for your report!
Fluentd tail plugin was outputting
If you keep getting this message, please restart Fluentd
. After coming across https://github.com/fluent/fluentd/issues/3614, we implemented the workaround suggested there.
- changed
follow_inodes
totrue
- set
rotate_wait
to0
So, follow_inodes false
has a similar issue.
Could you please report an issue of follow_inodes false
in a new issue?
@daipom In this case I had follow_inodes true
Do you want me to open a new issue just for tracking?
@uristernik
Wasn't there a problem with follow_inodes false
as well?
I'd like to sort out each of follow_inodes false
problem and follow_inodes true
problem.
I'd like to know if there is any difference between follow_inodes false
and follow_inodes true
.
For example, whether the same resource leakage occurs when follow_inodes false
.
If there is no particular difference, we are fine with this for now. Thanks!