bpipe icon indicating copy to clipboard operation
bpipe copied to clipboard

java.io.IOException: User limit of inotify instances reached or too many open files

Open luuloi opened this issue 8 years ago • 8 comments

I have a simple bpipe as here

run {

   "%_R*.fastq.gz" * [ fastqc + trim_fastq + alignment + [ flagstat_lane , BaseRecalibrator + PrintReads ] + move_lane_bam ] +
        
  "%.move" *	       [ merge_lane_bam + [ RealignerTargetCreator + IndelRealigner , bigwig, samstat, qualimap, qualimap_dedup ] + HaplotypeCaller + make_join_calling_file ] +
                   [ GenotypeGVCFs + VariantRecalibrator_SNP + ApplyRecalibration_SNP + VariantRecalibrator_indel + ApplyRecalibration_indel ]                          
}

with 14 input samples. When I run this pipe, I always get the error as following:

java.io.IOException: User limit of inotify instances reached or too many open files at bpipe.OutputDirectoryWatcher.(OutputDirectoryWatcher.groovy:74) at bpipe.OutputDirectoryWatcher.getDirectoryWatcher(OutputDirectoryWatcher.groovy:225) at bpipe.OutputDirectoryWatcher$getDirectoryWatcher.call(Unknown Source) at bpipe.PipelineContext.outputTo(PipelineContext.groovy:2113) at bpipe.PipelineContext$_getOutput_closure4.doCall(PipelineContext.groovy:410) at bpipe.PipelineOutput.setDir(PipelineOutput.groovy:228) at bpipe.PipelineOutput.setProperty(PipelineOutput.groovy) at stage_WGS10X$_run_closure3.doCall(stage.WGS10X.bpipe:22) at bpipe.PipelineStage.invokeMethod(PipelineStage.groovy) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy:388) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy) at bpipe.ParamsBinding.withLocalVariables(Runner.groovy:1001) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy:387) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy) at bpipe.PipelineStage.runBody(PipelineStage.groovy:375) at bpipe.PipelineStage.run(PipelineStage.groovy:248) at bpipe.PipelineCategory$_plus_closure3.doCall(PipelineCategory.groovy:171) at bpipe.PipelineStage.invokeMethod(PipelineStage.groovy) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy:388) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy) at bpipe.ParamsBinding.withLocalVariables(Runner.groovy:1001) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy:387) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy) at bpipe.PipelineStage.runBody(PipelineStage.groovy:375) at bpipe.PipelineStage.run(PipelineStage.groovy:248) at bpipe.PipelineCategory$_plus_closure4.doCall(PipelineCategory.groovy:192) at bpipe.PipelineStage.invokeMethod(PipelineStage.groovy) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy:388) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy) at bpipe.ParamsBinding.withLocalVariables(Runner.groovy:1001) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy:387) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy) at bpipe.PipelineStage.runBody(PipelineStage.groovy:375) at bpipe.PipelineStage.run(PipelineStage.groovy:248) at bpipe.PipelineCategory$_plus_closure3.doCall(PipelineCategory.groovy:151) at bpipe.PipelineStage.invokeMethod(PipelineStage.groovy) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy:388) at bpipe.PipelineStage$_runBody_closure3$_closure9.doCall(PipelineStage.groovy) at bpipe.ParamsBinding.withLocalVariables(Runner.groovy:1001) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy:387) at bpipe.PipelineStage$_runBody_closure3.doCall(PipelineStage.groovy) at bpipe.PipelineStage.runBody(PipelineStage.groovy:375) at bpipe.PipelineStage.run(PipelineStage.groovy:248) at bpipe.Pipeline.runSegment(Pipeline.groovy:538) at bpipe.Pipeline$runSegment$16.call(Unknown Source) at bpipe.PipelineCategory$_splitOnMap_closure11$_closure38.doCall(PipelineCategory.groovy:527) at bpipe.PipelineCategory$_splitOnMap_closure11$_closure38.doCall(PipelineCategory.groovy)

Any helps/suggestions/advice are appreciated. Thank you.

luuloi avatar May 19 '17 00:05 luuloi

This is an unfortunate side effect of recent changes made to use native file system events (inotify) to know when files get created / modified.

You can turn off use of inotify with:

usePollerFileWatcher=true

in the bpipe.config (or ~/.bpipeconfig) and it should make this error go away. However it will also mean your Bpipe process is using more resources and is less efficient - so it may be worth exploring if you can have the system limit for inotify instances raised (see here for example).

ssadedin avatar May 19 '17 09:05 ssadedin

Also, one aspect of this is that it registers one inotify instance per output directory. So if you are creating a huge number of directories as part of your pipeline, that can trigger the problem. If that isn't essential to your pipeline then you could possibly avoid the issue by creating less directories.

ssadedin avatar May 19 '17 09:05 ssadedin

it may be worth exploring if you can have the system limit for inotify instances raised

This is complicated in that inotify doesn't work when dealing with shared filesystems (e.g. Lustre or NFS). We have been recommending usePollerFileWatcher=true.

rxk900 avatar May 19 '17 10:05 rxk900

Thank you for all useful advice!

luuloi avatar May 22 '17 01:05 luuloi

Hi @ssadedin and @luuloi , some further detail on the environment underlying 'bpipe' here. bpipe is running in a virtual machine which is an NFS client. Bpipe is polling for files in this client's NFS mount. The NFS export on the server, which serves out the files that bpipe is polling for, is a Lustre mount that is being re-shared over NFS. Neither NFS nor Lustre support the inotify API.

Consequently, it's highly unlikely that the inotify API will work properly in this use case. Polling is the only way to discover new files.

It is possible that the inotify APIs succeed in the sense of returning 0 and not setting errno. I haven't tested this case. In other words, bpipe may have to deal with the possibility that the inotify API appears to work, but there are never any events.

MattSANU avatar May 22 '17 05:05 MattSANU

The immediate problem in this case was that on the machine running bpipe, /proc/sys/fs/inotify/max_user_instances had value 128, which is not sufficient in this use case. That limit has now been increased to 1024. But as I was saying above, I believe that permitting bpipe to create additional inotify instances was pointless, because none of its inotify instances is ever going to work properly in this use case.

MattSANU avatar May 22 '17 05:05 MattSANU

To make inotify work properly in this use case, there would have to be some sort of integration between Lustre change logs, NFS leases, and the NFS client's inotify watches, across a minimum of three different machines (Lustre MDT, Lustre client+NFS server, and NFS client). None of this exists right now.

If bpipe is receiving notifications in this use case, I suppose it must have fallen back to polling, despite being configured to use the 'native' watcher not the 'poller' watcher. Is this possible?

MattSANU avatar May 22 '17 06:05 MattSANU

@MattSANU - I'm not entirely sure, but it is possible that Java itself falls back to a manual approach. Bpipe tries first to create a high sensitivity notification and then falls back to low sensitivity, which I think could possibly be emulated by Java. You can see the code / logic here and associated log messages that will tell you what's going on if you look in your log files:

https://github.com/ssadedin/bpipe/blob/master/src/main/groovy/bpipe/OutputDirectoryWatcher.groovy#L194

ssadedin avatar Jun 10 '17 05:06 ssadedin