tapasco
tapasco copied to clipboard
Reduce usage of iNotify to avoid running out of resources
Right now Tapasco uses so many instances that on some hosts it fails due to lack of iNotify.
@jahofmann: Can you provide a reproducer for this issue?
I did some research into this issue, to summarize my findings:
- a simple compose job (arraysum example) uses ca. 350 inotify watches. While this is far from the upper limit (500k for most Linux distributions), most of those watches are redundant and could be removed.
- tapasco uses deeply nested directory structures. The number of watches could be reduced by watching only the most deeply nested directory (eg if the file of interest is in
a/b/c/d/file.txt
, it is enough to watchd
. Directoriesa
,b
andc
do not need to be watched in this case.) - log files are watched by a separate class (
MultiFileWatcher
) which is polling based and not inotify based - when I completely disabled the inotify subsystem of tapasco, the above mentioned compose job still finished successfully
The last point seems to indicate that there are use cases, for which inotify watches are not needed at all.
Can anybody help me understand for which use cases they actually are needed? Or more specifically, what are the files that need to be watched for tapasco to execute correctly?
IIRC, the iNotifys are mainly used to avoid excessive, repeated traversal of directories and keeping cached views of the file-system up-to-date. @jkorinth: Do you still remember why the iNotifys were introduced? Do you expect the performance to drop significantly, if we completely remove them?
I assume they are created by DirectoryWatcher
s? They were useful for the now defunct GUI (iTaPaSCo
, never really worked the way I wanted anyway - guess I'm the wrong guy for GUIs :-D), but as far as I recall they were primarily for SLURM batches on the HRZ Lichtenberg cluster:
I faintly remember having issues when trying to launch a few hundred jobs at once, though I do not recall the details. I think the whole scanning traversals took significant time, and time was limited on the high memory nodes back then. Anyway, I think you should never notice that on workstations. And for the cluster I think you'd be better off with having some sort of binary index file instead (I think there even was an "algun dia" for that once).