logstash-input-file
logstash-input-file copied to clipboard
Files on NFS volume vs sincedb
My logs are on a NFS volume. They are correctly parsed. After reboot, they are parsed again.
The reason is the sincedb format. This file identifies processed file with their major+minor+inode. minor is not the same each time the same NFS volume is mounted on the same NFS client. Therefore, the old files are seen as new files after reboot.
Why not identify processed files with full path ?
This is indeed necessary also when using rsync as mentionned in pending PR https://github.com/jordansissel/ruby-filewatch/pull/34
I haven't confirmed if this is still a bug, but I agree about the problem. Logstash should somehow detect that the file being watched is on a remote filesystem or allow users to explicitly follow files by path name (not implicit inode tracking).
I can confirm this is still an issue - I was bulk importing from a NFS and after rebooting / remount files which were already processed were being again processed - I use a fixed sincedb file and noticed the minor value changed from 24 to 25.
I think we can close this - read more.
Except, we have not done the fingerprinting bit yet.
Also showing in https://discuss.elastic.co/t/logstash-cant-read-some-files/143847
I wrote a ruby script to help me deal with this problem
this script wil modify inode info( which may changes after re_mount ) in sincedb file when using logstash's logstash-input-file plugin on nfs. https://gist.github.com/zhenchuan/10bd5eafb6c4058a83c17e053278d889