fluentd
fluentd copied to clipboard
Losing logs from fluent-bit to fluentd cluster during brief outages on the fluentd cluster
Describe the bug
Hi,
a log ship solution, we are using fluent-bit on client VM's and sending logs to a 3 node fluentd cluster.
if I stop all 3 nodes in fluentd cluster at same time say for example for 2 minutes then restart fluentd on all 3 nodes. When i check the shipped logs we are missing 60 seconds of logs from the 2 minutes offline period.
To Reproduce
writing a simple log to test, print the date every 1 second:
while sleep 1; do date; done > /tmp/test.log
on fluentd cluster stop the cluster (stop all 3 nodes at same time)
untar the test logfile and see timestamp of last log line
wait 2 minutes and restart the fluentd cluster
wait for new log from client to appear and untar and read first few lines.
if buffers worked as we expect there should be no lost data
everytime there is lost data
tail -5 ie1-abc01b-nxt.nxt.test-test_20230323_02a.log Thu Mar 23 15:56:51 UTC 2023 Thu Mar 23 15:56:52 UTC 2023 Thu Mar 23 15:56:53 UTC 2023 Thu Mar 23 15:56:54 UTC 2023 Thu Mar 23 15:56:55 UTC 2023
head ie1-abc01b-nxt.nxt.test-test_20230323_03a.log
Thu Mar 23 15:57:57 UTC 2023 Thu Mar 23 15:57:58 UTC 2023 Thu Mar 23 15:57:59 UTC 2023 Thu Mar 23 15:58:00 UTC 2023 Thu Mar 23 15:58:01 UTC 2023
in above example we've lost 1 minutes data
Expected behavior
if buffers worked as we expect there should be no lost data
Your Environment
- Fluentd version:4.4.2
- TD Agent version: 2.0.6
- Operating system:Centos7
- Kernel version: 3.10.0-1160.83.1.el7.x86_64
Your Configuration
client fluent-bit configuration:
logship-fluent-bit.conf
[SERVICE]
# Flush
# =====
# set an interval of seconds before to flush records to a destination
flush 30
[INPUT]
name tail
path /tmp/test.log
path_key log_file
tag i2.2y.default.sgb.${HOSTNAME}.<filename>
tag_regex (\/.*\/)(?<filename>.+)
Storage.type memory
DB /var/log/logship/buffer/tail-0.db
DB.locking true
DB.journal_mode WAL
[OUTPUT]
Name forward
Match *
Host ie1-logship-nxt.nxt.endpoint
Port 80
Compress gzip
And on fluentd cluster the config :
<system>
workers 1
rpc_endpoint 0.0.0.0:24724
</system>
<source>
@type forward
port 24224
@id forward
</source>
<match i2**>
@type file
@id file
compress gzip
path /data/${tag[1]}/%Y/%m/%d/${tag[2]}/${tag[3]}/${tag[4]}.${tag[5]}.${tag[6]}-${tag[7]}_%Y%m%d_03a
append
<buffer tag,time>
@type memory
flush_thread_count 8
chunk_limit_size 8M
queue_limit_length 64
retry_max_interval 30
retry_max_times 1000
flush_mode interval
flush_interval 30s
</buffer>
<format>
@type single_value
message_key log
</format>
</match>
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
@id monitor_agent
</source>
Your Error Log
no error in logs, just missing data
Additional context
No response