fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Fluentd fails to recover if metadata/chunk contains no data.

Open mar-kolya opened this issue 8 years ago • 6 comments

Somehow I got this in file buffer:

$ ls -la /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log*
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log
-rw-r--r-- 1 td-agent td-agent 0 Nov 30 15:22 /var/log/td-agent/buffer/buffer.b55f34d13a918cfd38c26929392511b73.log.meta

This is probably caused by ungraceful shutdown.

This causes this exception to happen forever after restarting:

2017-11-30 15:27:25 +0000 [error]: #0 unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method `<' for nil:NilClass"
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `block in enqueue_thread_run'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:429:in `block in enqueue_all'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `each'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/buffer.rb:424:in `enqueue_all'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin/output.rb:1277:in `enqueue_thread_run'
  2017-11-30 15:27:25 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-0.14.24/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

fluentd version: 0.14.24

This may be somewhat related to https://github.com/fluent/fluentd/issues/1760

mar-kolya avatar Nov 30 '17 18:11 mar-kolya

This is probably caused by ungraceful shutdown.

Does this mean you did following steps?

  1. stop fluentd
  2. install new fluentd
  3. start fluentd again

repeatedly avatar Dec 01 '17 12:12 repeatedly

The exact steps are:

  • In packer to have following things to get fluentd into AMI:
    • curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
    • sudo /opt/td-agent/embedded/bin/fluent-gem install fluentd
    • install config
  • After that when EC2 instance boots fluentd starts and restarts a few times while config is being properly configured (S3 access parameters).
  • With this procedure some times (not always) fluentd gets stuck with exception I mentioned above and when I look at the buffer dir I see zero length files which seemingly correspond with exception.

But from overall perspective it seems like buffer plugin should ignore/remove chunks in cannot read for whatever reason instead of looping on them forever. Files/FS may get corrupted may get corrupted for various reasons and it would be nice to recover from that automatically, without manual intervention.

Thanks!

mar-kolya avatar Dec 01 '17 14:12 mar-kolya

Thanks for writing detailed steps. I'm not sure when generates zero size files but it should be ignored in load / read.

I will attend CNCon/KubeCon in this week. So I will write a patch after finish my stuff at CNCon/KubeCon.

repeatedly avatar Dec 03 '17 13:12 repeatedly

Hello,I have encountered this problem too in fluentd v1.10,is this bug fixed?

metayd avatar Mar 14 '18 09:03 metayd

@dbdd4us If you hit this problem during fluentd restart, the problem should be fixed in v1.1.1 > https://www.fluentd.org/blog/fluentd-v1.1.1-has-been-released

repeatedly avatar Mar 14 '18 15:03 repeatedly

I am using fluentd (td-agent) 1.3.3

Have seen this problem with empty meta/buffer files for a while, and while it no longer crashes fluentd I recently saw an instance of fluentd using up around 1.5 million inodes with empty meta files. This almost completely filled up the disk's inode pool, and it was unusable until I manually deleted all the zero byte meta files.

Any idea what might be causing this? Anything I can do to get fluentd to clean up empty meta files on its own?

bhperry avatar Aug 20 '19 21:08 bhperry