logstash-input-file
logstash-input-file copied to clipboard
Logratation causes that file is read duplicately
Hi,
I just upgraded my ELK cluster from 2.x version on Debian 8 to 6.x version on Debian 9. Some basic information about my setup. I have rsyslog that receives data and write them into the files. Logstash with input plugin file reads files using wildcard path and after some processing it sends them to Elasticsearch nodes. Files are rotated by logrotate with nocopy/nocopytruncate options. So inodes of file is kept, files are only renamed (from test.log.1 -> test.log.2, test.log -> test.log.1, etc.) and new file is created (test.log).
On Debian 8 with ELK v2.x everything worked just fine. After upgrading to Debian 9 with ELK v6.x I encountered a problems with duplicately processed files after logrotation takes place. I was trying to replicate it to have some debug informations. The issue doesn't occur everytime in my testing environment.
In this particular case I was able to replicate the issue with last file beeing processed duplicately by input plugin file. For example, if I have: test.log test.log.1 test.log.2 Then during logrotation test.log.2 is renamed to test.log.3, test.log.1 is renamed to test.log.2, test.log is renamed to test.log.1 and new file test.log is created. Logstash watches all the files and trackes them in sincedb file. Sometimes after logrotation, logstash picks the last file, test.log.3 and reads it again.
In tested case that is stated below the problem was with the file test.log.17 that was processed again after logrotation as test.log.18.
Logrotate config:
/var/ls/remotelogs/test/*.log
{
hourly
rotate 24
maxage 26
nocopy
nocopytruncate
create 640 root logstash
nocompress
nodelaycompress
missingok
notifempty
sharedscripts
postrotate
invoke-rc.d rsyslog rotate > /dev/null
endscript
}
Logstash input configuration (ignore_older and sincedb_clean_after are kept in default settings what should be OK):
input {
file {
path => [ "/var/ls/remotelogs/test/test.log*" ]
codec => "plain"
type => "test"
discover_interval => 15
sincedb_path => "/var/ls/sincedb/monitored/test.touch"
sincedb_write_interval => 1
start_position => "beginning"
stat_interval => 2
}
}
Sincedb file before final logratation (problematic file on the top, with inode number 14, named test.log.17):
14 0 65035 122278 1543852831.1367729 /var/ls/remotelogs/test/test.log.17
12 0 65035 41873 1543852831.340915 /var/ls/remotelogs/test/test.log.16
13 0 65035 25786 1543852831.340071 /var/ls/remotelogs/test/test.log.15
15 0 65035 28589 1543852831.339257 /var/ls/remotelogs/test/test.log.14
16 0 65035 1529487 1543852831.341586 /var/ls/remotelogs/test/test.log.13
17 0 65035 5708664 1543852831.324611 /var/ls/remotelogs/test/test.log.12
18 0 65035 1828226 1543852831.282213 /var/ls/remotelogs/test/test.log.11
19 0 65035 1249527 1543852831.336627 /var/ls/remotelogs/test/test.log.10
20 0 65035 2587309 1543852831.33844 /var/ls/remotelogs/test/test.log.9
21 0 65035 5794124 1543852831.337893 /var/ls/remotelogs/test/test.log.8
22 0 65035 17483839 1543852831.340473 /var/ls/remotelogs/test/test.log.7
23 0 65035 23506699 1543852831.3483942 /var/ls/remotelogs/test/test.log.6
24 0 65035 7435294 1543852831.338851 /var/ls/remotelogs/test/test.log.5
25 0 65035 7865804 1543852831.33967 /var/ls/remotelogs/test/test.log.4
26 0 65035 13471598 1543852831.3373759 /var/ls/remotelogs/test/test.log.3
27 0 65035 3561704 1543852831.34687 /var/ls/remotelogs/test/test.log.2
28 0 65035 22756747 1543852831.3479078 /var/ls/remotelogs/test/test.log.1
29 0 65035 54957474 1543852999.732651 /var/ls/remotelogs/test/test.log
ls -li:
14 -rw-r----- 1 root logstash 122278 Dec 3 15:45 test.log.17
files from /var/ls/remotelogs/test/ that are opened by logstash process:
lr-x------ 1 logstash logstash 64 Dec 3 17:00 96 -> /var/ls/remotelogs/test/test.log
Just after logrotation, see the sincedb file (the file with inode 14 is still there, but with old filename test.log.17. test.log.17 is there twice, as test.log.16 that was renamed to test.log.17 with inode number 12 and test.log.17 that has inode number 14, but now is named test.log.18 - this change stays unnoticed by logstash input filter):
14 0 65035 122278 1543852831.1367729 /var/ls/remotelogs/test/test.log.17
12 0 65035 41873 1543853007.849294 /var/ls/remotelogs/test/test.log.17
13 0 65035 25786 1543853007.8600118 /var/ls/remotelogs/test/test.log.16
15 0 65035 28589 1543853007.860524 /var/ls/remotelogs/test/test.log.15
16 0 65035 1529487 1543853007.8608618 /var/ls/remotelogs/test/test.log.14
17 0 65035 5708664 1543853007.8611422 /var/ls/remotelogs/test/test.log.13
18 0 65035 1828226 1543853007.862993 /var/ls/remotelogs/test/test.log.12
19 0 65035 1249527 1543853007.863268 /var/ls/remotelogs/test/test.log.11
20 0 65035 2587309 1543853007.863535 /var/ls/remotelogs/test/test.log.10
21 0 65035 5794124 1543853007.863792 /var/ls/remotelogs/test/test.log.9
22 0 65035 17483839 1543853007.8640602 /var/ls/remotelogs/test/test.log.8
23 0 65035 23506699 1543853007.86431 /var/ls/remotelogs/test/test.log.7
24 0 65035 7435294 1543853007.8645642 /var/ls/remotelogs/test/test.log.6
25 0 65035 7865804 1543853007.864824 /var/ls/remotelogs/test/test.log.5
26 0 65035 13471598 1543853007.8650851 /var/ls/remotelogs/test/test.log.4
27 0 65035 3561704 1543853007.865335 /var/ls/remotelogs/test/test.log.3
28 0 65035 22756747 1543853007.865582 /var/ls/remotelogs/test/test.log.2
29 0 65035 55107447 1543853007.871568 /var/ls/remotelogs/test/test.log.1
30 0 65035 54944 1543853007.99283 /var/ls/remotelogs/test/test.lo
ls-li shows that file is really renamed and logrotation was as expected (without inode change):
11 drwx------ 2 root root 16384 Dec 3 13:12 lost+found
30 -rw-r----- 1 root logstash 85681 Dec 3 17:03 test.log
29 -rw-r----- 1 root logstash 55108493 Dec 3 17:03 test.log.1
20 -rw-r----- 1 root logstash 2587309 Dec 3 15:58 test.log.10
19 -rw-r----- 1 root logstash 1249527 Dec 3 15:57 test.log.11
18 -rw-r----- 1 root logstash 1828226 Dec 3 15:56 test.log.12
17 -rw-r----- 1 root logstash 5708664 Dec 3 15:55 test.log.13
16 -rw-r----- 1 root logstash 1529487 Dec 3 15:52 test.log.14
15 -rw-r----- 1 root logstash 28589 Dec 3 15:50 test.log.15
13 -rw-r----- 1 root logstash 25786 Dec 3 15:48 test.log.16
12 -rw-r----- 1 root logstash 41873 Dec 3 15:47 test.log.17
14 -rw-r----- 1 root logstash 122278 Dec 3 15:45 test.log.18
28 -rw-r----- 1 root logstash 22756747 Dec 3 16:42 test.log.2
27 -rw-r----- 1 root logstash 3561704 Dec 3 16:32 test.log.3
26 -rw-r----- 1 root logstash 13471598 Dec 3 16:31 test.log.4
25 -rw-r----- 1 root logstash 7865804 Dec 3 16:25 test.log.5
24 -rw-r----- 1 root logstash 7435294 Dec 3 16:21 test.log.6
23 -rw-r----- 1 root logstash 23506699 Dec 3 16:18 test.log.7
22 -rw-r----- 1 root logstash 17483839 Dec 3 16:08 test.log.8
21 -rw-r----- 1 root logstash 5794124 Dec 3 16:01 test.log.9
After few seconds, filename is updated in sincedb file is updated (notice that offset counter was also updated as the file was processed again):
14 0 65035 98121 1543853016.239249 /var/ls/remotelogs/test/test.log.18
12 0 65035 41873 1543853007.849294 /var/ls/remotelogs/test/test.log.17
13 0 65035 25786 1543853007.8600118 /var/ls/remotelogs/test/test.log.16
15 0 65035 28589 1543853007.860524 /var/ls/remotelogs/test/test.log.15
16 0 65035 1529487 1543853007.8608618 /var/ls/remotelogs/test/test.log.14
17 0 65035 5708664 1543853007.8611422 /var/ls/remotelogs/test/test.log.13
18 0 65035 1828226 1543853007.862993 /var/ls/remotelogs/test/test.log.12
19 0 65035 1249527 1543853007.863268 /var/ls/remotelogs/test/test.log.11
20 0 65035 2587309 1543853007.863535 /var/ls/remotelogs/test/test.log.10
21 0 65035 5794124 1543853007.863792 /var/ls/remotelogs/test/test.log.9
22 0 65035 17483839 1543853007.8640602 /var/ls/remotelogs/test/test.log.8
23 0 65035 23506699 1543853007.86431 /var/ls/remotelogs/test/test.log.7
24 0 65035 7435294 1543853007.8645642 /var/ls/remotelogs/test/test.log.6
25 0 65035 7865804 1543853007.864824 /var/ls/remotelogs/test/test.log.5
26 0 65035 13471598 1543853007.8650851 /var/ls/remotelogs/test/test.log.4
27 0 65035 3561704 1543853007.865335 /var/ls/remotelogs/test/test.log.3
28 0 65035 22756747 1543853007.865582 /var/ls/remotelogs/test/test.log.2
29 0 65035 55108493 1543853010.088067 /var/ls/remotelogs/test/test.log.1
30 0 65035 262418 1543853014.146191 /var/ls/remotelogs/test/test.log
Process is now reading three files (test.log and test.log.1 is expected, but also test.log.18 is beeing read):
lr-x------ 1 logstash logstash 64 Dec 3 17:03 113 -> /var/ls/remotelogs/test/test.log.1
lr-x------ 1 logstash logstash 64 Dec 3 17:03 114 -> /var/ls/remotelogs/test/test.log.18
lr-x------ 1 logstash logstash 64 Dec 3 17:00 96 -> /var/ls/remotelogs/test/test.log
Stat on test.log.18 file:
File: test.log.18
Size: 122278 Blocks: 240 IO Block: 4096 regular file
Device: fe0bh/65035d Inode: 14 Links: 1
Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 999/logstash)
Access: 2018-12-03 15:37:59.878413469 +0100
Modify: 2018-12-03 15:45:03.793572841 +0100
Change: 2018-12-03 17:03:25.975984162 +0100
Birth: -
Basic version info of current setup:
- Logstash version: 6.4.2-1
- Logrotate version: 3.11.0-0.1
- OS: Debian 9.6
Please try the latest version - changelog
bin/logstash-plugin install logstash-input-file --version 4.1.8
Is this fixed in the latest release? Because i currently face a similarly problem. Here is my environment. On server A i use filebeat with file output and rotation. Then i transfer these files to server B (logstash server) with sftp every 10 minutes. The inodes of the transferred files remain the same. Now the problem is that every time the new files were transferred to logstash server and logstash read them, it seems that it always reads all files again and put them into elasticsearch. Lets say i have 1000 entries in filebeat, filebeat.1, filebeat.2, ...
. Then on first read i have 1000
entries in elasticsearch. On second read it's 1800, 2100, 2600, ...
even though the number of entries in the logfiles only increase by 10 or 20.