fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Fluentd Forwarder pods getting flooded with warning messages when used in "Forwarder-Aggregator" mode.

Open anonymous-trader opened this issue 2 years ago • 1 comments

Describe the bug

I am using Forwarder-Aggregator architecture of Fluentd to send my application logs to Splunk with Fluentd running in both Forwarder and Aggregator. Forwarder --> Forwarder deployed as Daemonset which collects logs from node and send it to Aggregator. Aggregator --> Aggregator deployed as Deployment and collects logs from Forwarder to send it to Splunk.

When Aggregator shuts down because of some reasons, I start getting below warnings message in Forwarder and then it never stops unless and until we don't kill forwarder pod.

2023-02-15 13:57:29 +0000 [warn]: #0 ack in response and chunk id in sent data are different chunk_id="5f4bd754fe2XXXXXX3a872fa077f94" ack="X0vXJzXXXXXXXXHR243iw==\n"

Fluentd Version : fluent/fluentd:v1.15.1-debian-1.1

Grafana graphs shows both Forwarder and Aggregator has enough CPU and Memory when this warning messages starts generating in Forwarder.

One more point I would like to add is, we see such warnings only if we set "flush_thread_count" value other than 1 in below give forward block.

To Reproduce

Run Fluentd in Forwarder-Aggregator mode and then shut down Aggregator

Expected behavior

Forwarder should stop generating mentioned warning messages when Aggregator recovered from shutdown.

Your Environment

- Fluentd version:  fluent/fluentd:v1.15.1-debian-1.1
- TD Agent version: 
- Operating system:
- Kernel version:

Your Configuration

Worker-0 of Forwarder:

<match **>
        @type forward
        send_timeout 60s
        recover_wait 10s
        hard_timeout 60s
        require_ack_response true
        heartbeat_type tcp
        keepalive true
        keepalive_timeout 30s
        <service_discovery>
          @type srv
          service tcp
          proto tcp
          interval 20
          weight 60
          hostname fluentd-aggregator-a
        </service_discovery>
        <service_discovery>
          @type srv
          service tcp
          proto tcp
          interval 20
          hostname fluentd-aggregator-b
          weight 60
        </service_discovery>
        <buffer>
          @type file
          total_limit_size 5GB
          path /var/fluentd_out/buffer_1
          queue_limit_length 100000
          chunk_limit_size 30000000
          overflow_action drop_oldest_chunk
          chunk_full_threshold 0.008
          flush_interval 10s
          flush_thread_count 8
          retry_forever false
          retry_timeout 1h
        </buffer>
        <secondary>
          @type file
          path /var/fluentd_out/out_plugin_secondary_1
        </secondary>
      </match>

Your Error Log

2023-02-15 13:57:29 +0000 [warn]: #0 ack in response and chunk id in sent data are different chunk_id="5f4bd754fe2XXXXXX3a872fa077f94" ack="X0vXJzXXXXXXXXHR243iw==\n"

Additional context

No response

anonymous-trader avatar Feb 15 '23 14:02 anonymous-trader

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

github-actions[bot] avatar Apr 10 '23 10:04 github-actions[bot]