Fluentd Forwarder pods getting flooded with warning messages when used in "Forwarder-Aggregator" mode.
Describe the bug
I am using Forwarder-Aggregator architecture of Fluentd to send my application logs to Splunk with Fluentd running in both Forwarder and Aggregator. Forwarder --> Forwarder deployed as Daemonset which collects logs from node and send it to Aggregator. Aggregator --> Aggregator deployed as Deployment and collects logs from Forwarder to send it to Splunk.
When Aggregator shuts down because of some reasons, I start getting below warnings message in Forwarder and then it never stops unless and until we don't kill forwarder pod.
2023-02-15 13:57:29 +0000 [warn]: #0 ack in response and chunk id in sent data are different chunk_id="5f4bd754fe2XXXXXX3a872fa077f94" ack="X0vXJzXXXXXXXXHR243iw==\n"
Fluentd Version : fluent/fluentd:v1.15.1-debian-1.1
Grafana graphs shows both Forwarder and Aggregator has enough CPU and Memory when this warning messages starts generating in Forwarder.
One more point I would like to add is, we see such warnings only if we set "flush_thread_count" value other than 1 in below give forward block.
To Reproduce
Run Fluentd in Forwarder-Aggregator mode and then shut down Aggregator
Expected behavior
Forwarder should stop generating mentioned warning messages when Aggregator recovered from shutdown.
Your Environment
- Fluentd version: fluent/fluentd:v1.15.1-debian-1.1
- TD Agent version:
- Operating system:
- Kernel version:
Your Configuration
Worker-0 of Forwarder:
<match **>
@type forward
send_timeout 60s
recover_wait 10s
hard_timeout 60s
require_ack_response true
heartbeat_type tcp
keepalive true
keepalive_timeout 30s
<service_discovery>
@type srv
service tcp
proto tcp
interval 20
weight 60
hostname fluentd-aggregator-a
</service_discovery>
<service_discovery>
@type srv
service tcp
proto tcp
interval 20
hostname fluentd-aggregator-b
weight 60
</service_discovery>
<buffer>
@type file
total_limit_size 5GB
path /var/fluentd_out/buffer_1
queue_limit_length 100000
chunk_limit_size 30000000
overflow_action drop_oldest_chunk
chunk_full_threshold 0.008
flush_interval 10s
flush_thread_count 8
retry_forever false
retry_timeout 1h
</buffer>
<secondary>
@type file
path /var/fluentd_out/out_plugin_secondary_1
</secondary>
</match>
Your Error Log
2023-02-15 13:57:29 +0000 [warn]: #0 ack in response and chunk id in sent data are different chunk_id="5f4bd754fe2XXXXXX3a872fa077f94" ack="X0vXJzXXXXXXXXHR243iw==\n"
Additional context
No response
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days