fluentd
fluentd copied to clipboard
Check for file type before (re)tailing
Is your feature request related to a problem? Please describe.
We've encountered this very weird file type changing issue during log rotation process. Sometimes Tomcat's catalina.out file gets turned into 'data' file type from 'text' file type after rotation. Still haven't figured out exactly why this is happening. Still investigating.
However, our td-agent process would attempt to resume tailing catalina.out but since it's not a text file anymore... it goes into a weird state and ceased ingesting all log entries - as in, affects all other defined log file ingestion. Just completely stops after an unknown amount of time. Increased memory usage, CPU maxed out at 100%, and seems stuck.
Describe the solution you'd like
Have fluentd/td-agent detect for file type "text" before resume ingesting log entires.
Describe alternatives you've considered
Since we don't exactly know why catalina.out get turn into data binary format during log rotation. We thought about having a script in place to:
- Stop td-agent
- Stop Tomcat
- mv/rm catalina.out
- touch catalina.out
- Restart Tomcat
- Restart td-agent
Additional context
For now, we've excluded sourcing problematic file and the problem seems to be resolved.
Perhaps even just logging a warning if the type of file being tailed isn't text?
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Still an issue. Even if a simple file type check and log warning message would be nice.
Even if a simple file type check and log warning message would be nice.
What excalty do you mean by "simple file type check"?
On most system, fIles do not have any metadata that says "I'm text data". So Fluentd cannot tell a file contains text data or binary data without actually reading the file and guessing the type.
Perhaps even just logging a warning if the type of file being tailed isn't text?
So I'm unable to think of any way to implement this reliably. At best, it will be unreliable/noisy guesswork.
I'm unfamiliar with Ruby but is it possible to make underlying OS/system calls to check file type? I know this isn't the absolute best way to check file types but it's something.
For example in Linux can use 'file' (unsure about other POSIX systems):
# file test.file test.file: data
(dd if=/dev/urandom of=./test.file bs=1b count=5)
# file ansible_facts.txt ansible_facts.txt: ASCII text, with very long lines
For example in Linux can use 'file' (unsure about other POSIX systems):
No, file
basically reads a few bytes from the target file and outputs the
"best guess".
It's essentially unreliable. For example, file
will happily detect UTF-16
texts as "binary" (UTF-16 is very common on Windows):
$ cat test.txt
あいうえお
$ iconv -t utf16be test.txt > test.utf16.txt
$ file test.utf16.txt
test.utf16.txt: data
This implies that we'll get bug reports about "Fluentd complains that text files are binaries" if we implement that kind of checks.
So I'm not convinced that the best move here is to tweak the logic in Fluentd. If we have flaky log producer that dumps a large binary as log file, I think we should fix that service!