modules
modules copied to clipboard
[FEATURE] Disabling JVM Hotspot in modules for JAVA tools
Is your feature request related to a problem? Please describe
I've encountered a problem with the JVM Hotspot for GATK processes when multiple GATK processes are run on the same node in singularity containers (details in nf-sarek issue #1030). There's also a recent Sarek issue with SIGBUS errors related to Hotspot (nf-sarek issue #1024).
Describe the solution you'd like
I'd like to proposed turning HotSpot off using -XX:-UsePerfData in the --java-options passed to GATK.
This has two effects - it should eliminate a class of bugs related to the JVM and hsperfdata, as well as stabilising nf-core Singularity modules in rare and hard-to-debug situations.
Describe alternatives you've considered
Hotspot is hard-coded in the JVM to write files to /tmp. It ignores the --tmp-dir flag passed to GATK.
As far as I can tell turning this off has no negative side effects beyond preventing the use of jstat and certain Java debuggers which don't seem to be used in nf-core. This detailed blog post from Evan Jones describes an improvement to Java GC efficiency from turning this system off.
Alternatives would include preventing singularity from mounting host /tmp into the container (I'm not certain how this might be achieved within nf-core), or using -XX:+PerfDisableSharedMem.
Additional context
I'm currently trialling nf-sarek with the -XX:-UsePerfData java option on ~100 human WGS and will update on stability.
Disabling JVM hotspot works to patch these out for GATK, but this can also be triggered by other some Java applications (such as picard commands run in nf-raredisease) are also causing this behaviour. -XX:-UsePerfData is stable in my experience across ~200 runs of Sarek.
ok, so picard should be patched as well, I'll do that in a separate PR then...
It may also be an issue for fastqc. It's happening to others so the patches are incredibly useful (https://github.com/nf-core/sarek/issues/1030), but I'm wondering if this is worth tagging with the nextflow devs as it seems to be a common issue.
Changed the name of the issue and kept it open, so that we can track other JAVA tools. all gatk4 modules have been patched (cf #3844), and we have a PR in sarek; https://github.com/nf-core/sarek/pull/1240
Great, thanks!
I'm trying out setting the _JAVA_OPTS environment variable for fastqc, which seems promising so far.
On Mon, 18 Sept 2023, 5:10 pm Maxime U Garcia, @.***> wrote:
Changed the name of the issue and kept it open, so that we can track other JAVA tools. all gatk4 modules have been patched (cf #3844 https://github.com/nf-core/modules/pull/3844), and we have a PR in sarek; nf-core/sarek#1240 https://github.com/nf-core/sarek/pull/1240
— Reply to this email directly, view it on GitHub https://github.com/nf-core/modules/issues/3455#issuecomment-1722860130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC25LCIPK2P7OLWUZEXQEPDX27XWVANCNFSM6AAAAAAYOBKJSU . You are receiving this because you authored the thread.Message ID: @.***>
I attempted to set up JAVA_TOOLS_OPTIONS and JAVA_OPTS in fgbio processes, but it did not resolve the issue. Fortunately, fgbio accepts direct parsing of -XX:-UsePerfData.
For completeness, you may need to set '_JAVA_OPTIONS' as well as 'JAVA_TOOLS_OPTIONS' and 'JAVA_OPTS'; https://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts has some more details on this.
@lfearnley Is this still an open issue? Or do we need to add this to the documentation somewhere (looking at @mashehu for that if thats the case).