fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

in_tail: read_bytes_limit_per_second doesn't work over file rotations

Open kdomanski opened this issue 2 years ago • 1 comments

Describe the bug

If a logfile is filled out quickly enough, it can be rotated several times per second. The throughput counter however lives in IOHandler, which is re-created on file rotation. As a consequence, read_bytes_limit_per_second is not respected when the log source is spammy enough.

To Reproduce

I deployed the following pod in an K8S cluster:

apiVersion: v1
kind: Pod
metadata:
  name: logflooder
  namespace: default
spec:
  containers:
  - image: ubuntu:bionic
    command: ["bash"]
    args: ["-c", "while true; do cat /etc/passwd; done"]
    imagePullPolicy: IfNotPresent
    name: fluentd
    resources:
      limits:
        cpu: "5"
        memory: 400Mi
      requests:
        cpu: "5"
        memory: 400Mi

The log throughput seemed to effectively constrained by the CPU limit and not the value of read_bytes_limit_per_second. The detected rotation of message appears several times per second.

Expected behavior

I'd expect the total log throughput to be bound by read_bytes_limit_per_second regardless of file rotations.

Your Environment

- Fluentd version: 1.14.2
- Operating system: Amazon Linux 2
- Kernel version: 4.14.252-195.483.amzn2.x86_64

Your Configuration

...
<source>
    @type tail
    @id in_tail_container_logs
    path "/var/log/containers/*.log"
    pos_file "/var/log/fluentd-containers.log.pos"
    read_bytes_limit_per_second 100k
    tag "kubernetes.*"
    exclude_path ["/var/log/containers/fluentd-*"]
    read_from_head true
    <parse>
      @type "json"
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      unmatched_lines
      time_type string
    </parse>
  </source>
...

kdomanski avatar Dec 01 '21 09:12 kdomanski

are you seeing the same as im seeing here? Im sending a constant 200 logs per second to this service. Fluentd + splunk hec output seem to stop periodically (i think its in_tail rather than output as the output logs show no errors).

Screenshot from 2021-12-09 13-00-23

ive tried removing read_bytes_limit_per_second revving to 1.14.3 setting follow_inodes re-writing the fluentd-hec plugin to use splunk ack

Nothing seems to fix this.

tehlers320 avatar Dec 09 '21 19:12 tehlers320