fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

File watchers might not be handled properly causing gradual increase in CPU/Memory usage

Open uristernik opened this issue 5 months ago • 3 comments

Describe the bug

Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd. After coming across https://github.com/fluent/fluentd/issues/3614, we implemented the workaround suggested there.

  • changed follow_inodes to true
  • set rotate_wait to 0

Since than we are not seeing the original If you keep getting this message, please restart Fluentd but still seeing lots of Skip update_watcher because watcher has been already updated by other inotify event. This is paired with a pattern of memory leaking and gradual increase in CPU usage until a restart occurs. image

To mitigate this I added pos_file_compaction_interval 20m as suggested here but this had no affect on the resource usage.

image

Related to https://github.com/fluent/fluentd/issues/3614. More specifically https://github.com/fluent/fluentd/issues/3614#issuecomment-1871484810

The suspicion is that some Watchers are not handled properly thus leaking and increasing CPU/Memory consumption until the next restart.

To Reproduce

Deploy fluentd (version v1.16.3-debian-forward-1.0) as a daemonset in a dynamic kubernetes cluster. Cluster is consisting of 50-100 nodes. This is the fluentd config:

Expected behavior

CPU / Memory should stay stable.

Your Environment

- Fluentd version: [v1.16.3-debian-forward-1.0](https://github.com/fluent/fluentd-kubernetes-daemonset#:~:text=debian%2Dcloudwatch%2D1-,Forward,-docker%20pull%20fluent)

Your Configuration

<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  follow_inodes true
  rotate_wait 0
  exclude_path ["/var/log/containers/fluentd*.log", "/var/log/containers/*kube-system*.log", "/var/log/containers/*calico-system*.log", "/var/log/containers/prometheus-node-exporter*.log", "/var/log/containers/opentelemetry-agent*.log"]
  pos_file_compaction_interval 20m
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key time
      time_type string
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      keep_time_key true
    </pattern>
    <pattern>
      format /^(?<time>.+?) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/
      time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
    </pattern>
  </parse>
  emit_unmatched_lines true
</source>


### Your Error Log

```shell
Skip update_watcher because watcher has been already updated by other inotify event

Additional context

https://github.com/fluent/fluentd/issues/3614

uristernik avatar Jan 15 '24 09:01 uristernik