tapasco Reduce usage of iNotify to avoid running out of resources

Right now Tapasco uses so many instances that on some hosts it fails due to lack of iNotify.

May 17 '19 08:05 jahofmann

@jahofmann: Can you provide a reproducer for this issue?

Jun 10 '20 15:06 sommerlukas

I did some research into this issue, to summarize my findings:

a simple compose job (arraysum example) uses ca. 350 inotify watches. While this is far from the upper limit (500k for most Linux distributions), most of those watches are redundant and could be removed.
tapasco uses deeply nested directory structures. The number of watches could be reduced by watching only the most deeply nested directory (eg if the file of interest is in a/b/c/d/file.txt, it is enough to watch d. Directories a, b and c do not need to be watched in this case.)
log files are watched by a separate class (MultiFileWatcher) which is polling based and not inotify based
when I completely disabled the inotify subsystem of tapasco, the above mentioned compose job still finished successfully

The last point seems to indicate that there are use cases, for which inotify watches are not needed at all.

Can anybody help me understand for which use cases they actually are needed? Or more specifically, what are the files that need to be watched for tapasco to execute correctly?

Jul 08 '20 16:07 mhrtmnn

IIRC, the iNotifys are mainly used to avoid excessive, repeated traversal of directories and keeping cached views of the file-system up-to-date. @jkorinth: Do you still remember why the iNotifys were introduced? Do you expect the performance to drop significantly, if we completely remove them?

Jul 16 '20 10:07 sommerlukas

I assume they are created by DirectoryWatchers? They were useful for the now defunct GUI (iTaPaSCo, never really worked the way I wanted anyway - guess I'm the wrong guy for GUIs :-D), but as far as I recall they were primarily for SLURM batches on the HRZ Lichtenberg cluster:

I faintly remember having issues when trying to launch a few hundred jobs at once, though I do not recall the details. I think the whole scanning traversals took significant time, and time was limited on the high memory nodes back then. Anyway, I think you should never notice that on workstations. And for the cluster I think you'd be better off with having some sort of binary index file instead (I think there even was an "algun dia" for that once).

Jul 17 '20 14:07 jkorinth

tapasco tapasco copied to clipboard

Reduce usage of iNotify to avoid running out of resources

tapasco
tapasco copied to clipboard