logstash-filter-grok icon indicating copy to clipboard operation
logstash-filter-grok copied to clipboard

grok should ignore tilde backup files when processing patterns_dir

Open jordansissel opened this issue 10 years ago • 3 comments
trafficstars

(This issue was originally filed by @mrec at https://github.com/elastic/logstash/issues/2271)


(This comes from the discussion of #2244)

When testing a config using grok and custom patterns, a user will often be editing pattern definition files in patterns_dir between run attempts. Many (most?) Linux-ey text editors create backup files, named as the original filename plus a ~ suffix, in the same location as the original; even though they aren't hidden these are often invisible by default in file browsers. When dealing with multiple pattern definition files, and especially when renaming them, it's possible to have a lot of these tilde files lying around after a while.

grok currently reads everything in patterns_dir, including any tilde backups. It quite reasonably doesn't define the order in which it reads them, and it doesn't warn if e.g. the definition of MYPATTERN in a stale patterns~ or previousfilename~ backup file overrides the definition of MYPATTERN in patterns. Hilarity ensues. Also hair-tearing, teeth-gnashing, bad language and various other undesirable outcomes.

I propose that grok should ignore any files in patterns_dir ending in a ~. There may be other things it'd be beneficial to blacklist too, but this seems like a good start.

jordansissel avatar May 17 '15 23:05 jordansissel

Many (most?) Linux-ey text editors create backup files, named as the original filename plus a ~ suffix

The last time I did research on this showed that vim, emacs, nano, and several other editors all use different backup file name schemes.

I am not in favor of hardcoded blacklisting of file names. Something similar to how .gitignore works would be preferable because it would be user controllable.

jordansissel avatar Aug 07 '15 21:08 jordansissel

Perhaps the fundamental problem here is that Logstash isn't transparent about what it's doing. Git's behavior is easily observable with e.g. git status while for Logstash you have to increase the log level to get any kind of clues.

Another difference compared to Git and other version control tools is that in those cases different people might want to ignore different files, so there is a clear value in having it configurable. Here, not as much.

I don't mind making this configurable, but the defaults should be sane so that casual users won't fall into this trap. Otherwise it's all just a waste of time.

magnusbaeck avatar Aug 08 '15 15:08 magnusbaeck

I provided PR #63, which may be a solution for this issue, without breaking compatibility.

breml avatar Nov 25 '15 12:11 breml