Fluentd fails to recover if metadata/chunk contains no data.
Somehow I got this in file buffer:
$ ls -la /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log*
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log.meta
This is probably caused by ungraceful shutdown.
This causes this exception to happen forever after restarting:
2017-11-30 15:27:25 +0000 [error]: #0 unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method `<' for nil:NilClass"
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `block in enqueue_thread_run'
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:429:in `block in enqueue_all'
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `each'
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `enqueue_all'
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `enqueue_thread_run'
2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
fluentd version: 0.14.24
This may be somewhat related to https://github.com/fluent/fluentd/issues/1760
This is probably caused by ungraceful shutdown.
Does this mean you did following steps?
- stop fluentd
- install new fluentd
- start fluentd again
The exact steps are:
- In
packerto have following things to getfluentdinto AMI:- curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
- sudo /opt/td-agent/embedded/bin/fluent-gem install fluentd
- install config
- After that when EC2 instance boots fluentd starts and restarts a few times while config is being properly configured (S3 access parameters).
- With this procedure some times (not always) fluentd gets stuck with exception I mentioned above and when I look at the buffer dir I see zero length files which seemingly correspond with exception.
But from overall perspective it seems like buffer plugin should ignore/remove chunks in cannot read for whatever reason instead of looping on them forever. Files/FS may get corrupted may get corrupted for various reasons and it would be nice to recover from that automatically, without manual intervention.
Thanks!
Thanks for writing detailed steps. I'm not sure when generates zero size files but it should be ignored in load / read.
I will attend CNCon/KubeCon in this week. So I will write a patch after finish my stuff at CNCon/KubeCon.
Hello,I have encountered this problem too in fluentd v1.10,is this bug fixed?
@dbdd4us If you hit this problem during fluentd restart, the problem should be fixed in v1.1.1 > https://www.fluentd.org/blog/fluentd-v1.1.1-has-been-released
I am using fluentd (td-agent) 1.3.3
Have seen this problem with empty meta/buffer files for a while, and while it no longer crashes fluentd I recently saw an instance of fluentd using up around 1.5 million inodes with empty meta files. This almost completely filled up the disk's inode pool, and it was unusable until I manually deleted all the zero byte meta files.
Any idea what might be causing this? Anything I can do to get fluentd to clean up empty meta files on its own?