fluentd Fluentd fails to recover if metadata/chunk contains no data.

Somehow I got this in file buffer:

$ ls -la /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log*
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log.meta

This is probably caused by ungraceful shutdown.

This causes this exception to happen forever after restarting:

2017-11-30 15:27:25 +0000 [error]: #0 unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method `<' for nil:NilClass"
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `block in enqueue_thread_run'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:429:in `block in enqueue_all'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `each'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `enqueue_all'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `enqueue_thread_run'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

fluentd version: 0.14.24

This may be somewhat related to https://github.com/fluent/fluentd/issues/1760

Nov 30 '17 18:11 mar-kolya

This is probably caused by ungraceful shutdown.

Does this mean you did following steps?

stop fluentd
install new fluentd
start fluentd again

Dec 01 '17 12:12 repeatedly

The exact steps are:

In packer to have following things to get fluentd into AMI:
- curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
- sudo /opt/td-agent/embedded/bin/fluent-gem install fluentd
- install config
After that when EC2 instance boots fluentd starts and restarts a few times while config is being properly configured (S3 access parameters).
With this procedure some times (not always) fluentd gets stuck with exception I mentioned above and when I look at the buffer dir I see zero length files which seemingly correspond with exception.

But from overall perspective it seems like buffer plugin should ignore/remove chunks in cannot read for whatever reason instead of looping on them forever. Files/FS may get corrupted may get corrupted for various reasons and it would be nice to recover from that automatically, without manual intervention.

Thanks!

Dec 01 '17 14:12 mar-kolya

Thanks for writing detailed steps. I'm not sure when generates zero size files but it should be ignored in load / read.

I will attend CNCon/KubeCon in this week. So I will write a patch after finish my stuff at CNCon/KubeCon.

Dec 03 '17 13:12 repeatedly

Hello,I have encountered this problem too in fluentd v1.10,is this bug fixed?

Mar 14 '18 09:03 metayd

@dbdd4us If you hit this problem during fluentd restart, the problem should be fixed in v1.1.1 > https://www.fluentd.org/blog/fluentd-v1.1.1-has-been-released

Mar 14 '18 15:03 repeatedly

I am using fluentd (td-agent) 1.3.3

Have seen this problem with empty meta/buffer files for a while, and while it no longer crashes fluentd I recently saw an instance of fluentd using up around 1.5 million inodes with empty meta files. This almost completely filled up the disk's inode pool, and it was unusable until I manually deleted all the zero byte meta files.

Any idea what might be causing this? Anything I can do to get fluentd to clean up empty meta files on its own?

Aug 20 '19 21:08 bhperry