fluent-plugin-forest
fluent-plugin-forest copied to clipboard
Buffered files not flushed when fluentd is restarted after a failure
Forest creates a new output plugin for each tag / path it sees. When using the file buffer the buffered data is not flushed to the output plugin upon startup (if fluentd failed previously), this is because the output plugins are not initialized until an event matching that tag arrives.
To solve this problem, forest should: 1) scan through all the configured buffer_paths, 2) list all the stored buffers in those paths, 3) regenerate the list of output plugins based on the tags found in the filename of those buffered files (e.g., by calling plant(tag)).
I've observed this behavior with this plugin and it's causing a tremendous amount of log loss. I'm actually using it on an aggregator host and I've watch files be left behind because of the td-agent subprocess dying.
@garthgoodson interesting solution, but how would forest know where buffer paths are of its children plants?
I'm curious @pitr @tagomoris - what is allowing the fluent-plugin-forest startup procedure that makes it skip the initialization process it normally carries out? This could easily be solved by having the plugin initiate startup in the exact same fashion, I would presume.
Forest "plants" sub-plugins as it sees new records, so it doesn't know what records it will see (or in the case of this bug, NOT see) in initialization phase.
This is definitely difficult problem, but there are some (or more) importance to flush bufferes which are left by previous process (and not be instanciated yet). I have some ideas to solve this problem (but not be easy):
- Fluentd core provides initialization steps which we can use it to scan left buffer files
- it requires too big cost to pay in normal start process of forest plugin
- What are there any things to do when flushes on start-up sequence will fail after a long time?
- Fluentd now have
--without-source
option to flush left buffers - If we can add hook points to scan-and-flush buffers on start-up sequence of plugin, we can pay any heavy cost to scan
buffer_path
to be flushed
- it requires too big cost to pay in normal start process of forest plugin
- We have some storages to store metadata of buffers to be flushed
- This requires many more dependencies: that is not acceptable for usual cases...
- Fluentd provides any KVS for plugins that to be serialized over process lifecycle
- Of course, that is very difficult to implement
I'm wondering which way is the best way. How do you think about these solutions?
It seems second solution is the only one that doesn't require a change to fluentd. Also, that's the one I'd prefer.
One concern I'd like to bring up is, during initialization, forest plugin should only initialize sub-plugins that did NOT flush properly, not all sub-plugins it ever saw.
Also, we need to be careful with situations where there was a config change and a sub-plugin that didn't have its buffer flushed is no longer defined.
We solved this in our code by adding a buffer path to the configuration that we scan on startup. Without this we could not use the plugin. It would be unacceptable to effectively drop buffered data. I'm not sure why this cost would be high.
On Wed, Aug 6, 2014 at 9:06 AM, TAGOMORI Satoshi [email protected] wrote:
This is definitely difficult problem, but there are some (or more) importance to flush bufferes which are left by previous process (and not be instanciated yet). I have some ideas to solve this problem (but not be easy):
Fluentd core provides initialization steps which we can use it to scan left buffer files
- it requires too big cost to pay in normal start process of forest plugin
- What are there any things to do when flushes on start-up sequence will fail after a long time?
- Fluentd now have --without-source option to flush left buffers
- If we can add hook points to scan-and-flush buffers on start-up sequence of plugin, we can pay any heavy cost to scan buffer_path to be flushed
- We have some storages to store metadata of buffers to be flushed
- This requires many more dependencies: that is not acceptable for usual cases...
- Fluentd provides any KVS for plugins that to be serialized over process lifecycle
- Of course, that is very difficult to implement
I'm wondering which way is the best way. How do you think about these solutions?
— Reply to this email directly or view it on GitHub https://github.com/tagomoris/fluent-plugin-forest/issues/15#issuecomment-51357239 .
Garth Goodson Natero, Founder www.natero.com | 650.308.9175
I'm guessing that while the plugin is scanning the buffer_path, it cannot receive events. Is this correct?If so, say you had 50000 8Mb files left from a plugin that crashed and the forest plugin re-planted or the td-agent as a whole crashed, it could take a long time to drain off these files, all the while blocking the handoff from forest to the planted output.
I think this, as an option, could be acceptable with a big warning sign on it:
- If we can add hook points to scan-and-flush buffers on start-up sequence of plugin, we can pay any heavy cost to scan buffer_path to be flushed
@garthgoodson would you mind sharing a diff?
I think one can minimize the time. Basically, the scan should just cache the filenames locally (it should be pretty fast since no data is read, just file/dir metadata); once that is done the data transfer for those files can begin, and new data can be collected.
On Wed, Aug 6, 2014 at 10:08 AM, Erik Redding [email protected] wrote:
I'm guessing that while the plugin is scanning the buffer_path, it cannot receive events. Is this correct?If so, say you had 50000 8Mb files left from a plugin that crashed and the forest plugin re-planted or the td-agent as a whole crashed, it could take a long time to drain off these files, all the while blocking the handoff from forest to the planted output.
I think this, as an option, could be acceptable with a big warning sign on it:
- If we can add hook points to scan-and-flush buffers on start-up sequence of plugin, we can pay any heavy cost to scan buffer_path to be flushed
—
Reply to this email directly or view it on GitHub https://github.com/tagomoris/fluent-plugin-forest/issues/15#issuecomment-51365465 .
Garth Goodson Natero, Founder www.natero.com | 650.308.9175
I am running into this problem as well. Has anyone come up with a good solution?
Hmm, I've got an idea to implement buffer_directory_path
to fluent-plugin-forest, and forest plugin generates buffer_path
parameter automatically on startup of plugin instances.
This feature makes us to find and recover all buffer files at the time when fluentd starts.
But I think that this feature seems too magical for many users. How about you all?
@tagomaris- another possibility: the 'forest' plugin could implement one disk-based buffer for all events. When any "child" blocks, forest blocks all events to all children.
Is this being worked on currently?
This problem is too hard to solve, because of the design of buffer api of Fluentd. And, I'm trying to solve this problem in brand new buffer api design of Fluentd v0.14.
How about this? Have Forest create its own status file which records every tree it plants. At startup, it looks for this file, and, if it finds it, recreates every tree listed there. That way, detecting and flushing any buffers become the responsibility of the other plugins.
Downside is, of course, that your number of trees now only ever goes up.