grok_exporter icon indicating copy to clipboard operation
grok_exporter copied to clipboard

Missing directories and directory globs

Open psycofdj opened this issue 4 years ago • 4 comments

Hello

I'm facing the same problem described in issue #81. In my use case, I'm monitoring logs stored in dynamically generated paths (dynaFile output of rsyslog) and the list of possible directories is not known in advance.

Q1: Is there any chance that grok_exporter will overcome the current implementation limitation about missing directories in the future ?

Q2: Would you consider supporting globs on the directory part of the input paths ?

Thanks. I also like your project and keep on the good work !

psycofdj avatar Mar 24 '20 05:03 psycofdj

Hi,

sorry for the late reply. Let me start by explaining the reason for the current limitations. grok_exporter on Linux uses the inotify system call to monitor log files. The manpage of inotify says:

Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created.

For example, if grok_exporter watches logfiles in /a/b/c/*.log, it creates one inotify watch for each logfile (to get notified when new log lines are written or when the log file is truncated), and additionally it creates one inotify watch for the directory /a/b/c/ (to get notified when a new log file is created, or an existing one is moved or deleted). That's why grok_exporter supports wildcards only on a file level and why it cannot handle removing the log directory.

Step 1 towards a more generic implementation would be to handle removing the log directory. In the example, that would mean grok_exporter creates an additional inotify watch for directory /a/b/ to get notified when /a/b/c/ is deleted, moved, or created. We could implement this recursively, creating directory watches for /, /a/, /a/b/, and /a/b/c/.

Step 2 would be to support wildcards for directories, like /a/b/*/*.log. Then we would need to watch all directories matching /a/b/*/. While this is simple in many cases, there are corner cases that make this difficult and error prone. For example, I imagine it difficult to handle a small script with the following three commands mv /a/b/c /a/b/d; mkdir /a/b/c; mv /a/b/d/*.log /a/b/c/. Moreover, directories /a/b/c/ and /a/b/d/ might be mount points on different file systems, or they might be symbolic links.

Step 3 would be to support sub-directory trees with arbitrary depth, like /a/b/**/*.log meaning all files matching *.log in any subdirectory tree within /a/b/. This is even harder, because it should support users moving directories and files up and down the tree.

As for the future plans: I think step 1 will be supported soon, and step 2 with limitations later.

fstab avatar Apr 11 '20 21:04 fstab

Thank you for the clear and very detailed answer, now see the challenge of such feature.

While I completely understand the benefits of using inotify at a file level, I wonder what would be the problems a lot more naive approach for directories: a background goroutine periodically resolving the globing and creating new handles for discovered files (and/or directory leaf) ?

psycofdj avatar Apr 12 '20 06:04 psycofdj

Thanks @fstab for your detailed answer.

In our project we have a similar "problem" and would be happy about a solution of "Step 2".

On several management servers we are facing following use case (customer specific folder with several logs):

/a/customer-1/a.log
/a/customer-1/b.log 
/a/customer-2/a.log
/a/customer-2/b.log

We have the advantage that our configuration management tool can create the configuration in such detail, but a "glob" would simplify the configuration and the underlaying code of our configuration management solution...

conturNDE avatar Aug 10 '20 09:08 conturNDE

In a similar case as @conturNDE I also believe globing of directories would be quite useful as well as removing the failure for missing directories.

gdsotirov avatar Jun 10 '22 06:06 gdsotirov