remote_syslog2 icon indicating copy to clipboard operation
remote_syslog2 copied to clipboard

Improperly tagged log file due to race condition

Open Bowbaq opened this issue 8 years ago • 6 comments

Given a configuration like this:

files:
  - path: /path/to/log/specific.log
     name: fancy_tag
  - path: /path/to/log/*.log

Given that /path/to/log/specific.log does not exist on startup, the following race condition can happen: race condition

The /path/to/log/specific.log file ends up matching the catch-all glob, and the tag defaults to the filename (ie. specific). On subsequent iterations of globFiles, the file is marked as already being tailed, so the tag is never updated.

I'm not entirely sure what the correct behavior should be. @snorecone thoughts?

Bowbaq avatar Nov 01 '16 21:11 Bowbaq

Thanks for this @Bowbaq !

I think the globs should be resolved and de-duplicated on startup. #11 is the next issue I had planned to address, which would make the glob behavior more robust. I think solving this issue should be part of that. If you feel so inclined as to give it a shot, that would be great!

snorecone avatar Nov 05 '16 16:11 snorecone

If globs are only resolved on startup, how does that work when a file gets created at a matching location after the daemon starts? It seems like it would get ignored, which reduces the usefulness of globs quite drastically.

Bowbaq avatar Nov 08 '16 00:11 Bowbaq

@snorecone polling should take care of that ^, yeah?

johlym avatar Nov 08 '16 00:11 johlym

Sorry, I wasn't very clear. I mean to say the globs should be resolved and de-duplicated on startup and every run of the file poller. If an explicit path is given, that tag should take precedence over any other tag given for a glob pattern.

snorecone avatar Nov 08 '16 00:11 snorecone

The problem I'm running into is between two glob patterns though. One of them is more "specific" than the other, but I think that'd be hard to determine programmatically.

One possible solution is to say that globs that appear earlier in the config file have precedence over globs appearing later in the config file.

Another problem is that even if the tag was properly resolved on the next poll, you'd get ~1 poll period's worth of logs tagged with the wrong tag, and the rest with the good tag. A possible solution is to re-send whatever what previously mis-tagged (this would have a small overhead of duplicated logs).

Bowbaq avatar Nov 08 '16 00:11 Bowbaq

@Bowbaq if you could define what the specificity would be for file globs when determining the tag, this would be totally fixable without the problem of:

Another problem is that even if the tag was properly resolved on the next poll, you'd get ~1 poll period's worth of logs tagged with the wrong tag, and the rest with the good tag. A possible solution is to re-send whatever what previously mis-tagged (this would have a small overhead of duplicated logs).

When I say:

I mean to say the globs should be resolved and de-duplicated on startup and every run of the file poller. If an explicit path is given, that tag should take precedence over any other tag given for a glob pattern.

I mean that instead of iterating through the file globs once, it should be done twice: the first time to de-duplicate and determine tags, and the second time to start watching.

snorecone avatar Nov 23 '16 19:11 snorecone