gatk icon indicating copy to clipboard operation
gatk copied to clipboard

SIGSEGV when running gatk/4.6.0.0

Open emwjacobson opened this issue 1 year ago • 6 comments

* Opening on behalf of a user on an HPC cluster, my knowledge in this field is a bit limited.

Affected tool(s) or class(es)

gatk HaplotypeCaller

Affected version(s)

Latest 4.6.0.0 release

Description

When running command, ~16 hours into the run the program crashes. Below is the start of the Java error report file

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f06ed243291, pid=1058615, tid=1058616
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0xcf291]  __memset_avx2_erms+0x11
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /bigdata/ramadugulab/luy/SNPcallingBreeding/core.1058615)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam -O AlignedCalToCcl_Scaffolds.vcf.gz -ERC GVCF

Host: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 64 cores, 20G, Rocky Linux release 8.8 (Green Obsidian)
Time: Sat Sep 28 04:11:19 2024 PDT elapsed time: 58592.788414 seconds (0d 16h 16m 32s)

---------------  T H R E A D  ---------------

Current thread (0x00007f06e4025b70):  JavaThread "main" [_thread_in_native, id=1058616, stack(0x00007f06edc7a000,0x00007f06edd7b000)]

Stack: [0x00007f06edc7a000,0x00007f06edd7b000],  sp=0x00007f06edbe6458,  free space=18014398509481393k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xcf291]  __memset_avx2_erms+0x11
C  [libgkl_pairhmm_omp5311772482084658743.so+0x1500f]  Java_com_intel_gkl_pairhmm_IntelPairHmm_computeLikelihoodsNative._omp_fn.0+0xcf

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 8942  com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoodsNative([Ljava/lang/Object;[Ljava/lang/Object;[D)V (0 bytes) @ 0x00007f06d563401c [0x00007f06d5633fa0+0x000000000000007c]
J 10003 c2 com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoods([Lorg/broadinstitute/gatk/nativebindings/pairhmm/ReadDataHolder;[Lorg/broadinstitute/gatk/nativebindings/pairhmm/HaplotypeDataHolder;[D)V (119 bytes) @ 0x00007f06d5bff3e0 [0x00007f06d5bff3a0+0x0000000000000040]
J 6781 c2 org.broadinstitute.hellbender.utils.pairhmm.VectorLoglessPairHMM.computeLog10Likelihoods(Lorg/broadinstitute/hellbender/utils/genotyper/LikelihoodMatrix;Ljava/util/List;Lorg/broadinstitute/hellbender/utils/pairhmm/PairHMMInputScoreImputator;)V (450 bytes) @ 0x00007f06d54f8cc8 [0x00007f06d54f8a00+0x00000000000002c8]
J 10022 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(Lorg/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyResultSet;Lorg/broadinstitute/hellbender/utils/genotyper/SampleList;Ljava/util/Map;Z)Lorg/broadinstitute/hellbender/utils/genotyper/AlleleLikelihoods; (25 bytes) @ 0x00007f06d5c0cb30 [0x00007f06d5c0b540+0x00000000000015f0]
J 9971 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(Lorg/broadinstitute/hellbender/engine/AssemblyRegion;Lorg/broadinstitute/hellbender/engine/FeatureContext;Lorg/broadinstitute/hellbender/engine/ReferenceContext;)Ljava/util/List; (2286 bytes) @ 0x00007f06d5bdef08 [0x00007f06d5bdcd60+0x00000000000021a8]
J 10571% c2 org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(Lorg/broadinstitute/hellbender/engine/MultiIntervalLocalReadShard;Lorg/broadinstitute/hellbender/engine/ReferenceDataSource;Lorg/broadinstitute/hellbender/engine/FeatureManager;)V (154 bytes) @ 0x00007f06d5c8e5c0 [0x00007f06d5c8dd20+0x00000000000008a0]
j  org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse()V+83
j  org.broadinstitute.hellbender.engine.GATKTool.doWork()Ljava/lang/Object;+19
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool()Ljava/lang/Object;+34
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs()Ljava/lang/Object;+225
j  org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain([Ljava/lang/String;)Ljava/lang/Object;+14
j  org.broadinstitute.hellbender.Main.runCommandLineProgram(Lorg/broadinstitute/hellbender/cmdline/CommandLineProgram;[Ljava/lang/String;)Ljava/lang/Object;+20
j  org.broadinstitute.hellbender.Main.mainEntry([Ljava/lang/String;)V+22
j  org.broadinstitute.hellbender.Main.main([Ljava/lang/String;)V+8
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f06edc39d00

Register to memory mapping:

RAX=0x0 is NULL
RBX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RCX=0x0000000000028318 is an unknown value
RDX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RSP=0x00007f06edbe6458 points into unknown readable memory: 0x00007f0673c89bc4 | c4 9b c8 73 06 7f 00 00
RBP=0x00007f06edd78f50 is pointing into the stack for thread: 0x00007f06e4025b70
RSI=0x0 is NULL
RDI=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
R8 =0x0000000000004f9a is an unknown value
R9 =0x0000000000000001 is an unknown value
R10=0x00000000000000c3 is an unknown value
R11=0x00007f06e47c9840 points into unknown readable memory: 0x4141474141414143 | 43 41 41 41 41 47 41 41
R12=0x00007f06edc119e0 points into unknown readable memory: 0x0000000000000000 | 00 00 00 00 00 00 00 00
R13=0x00007f06edbe96c0 points into unknown readable memory: 0x00007f06e4f65c50 | 50 5c f6 e4 06 7f 00 00
R14=0x0000000000028318 is an unknown value
R15=0x0000000000005063 is an unknown value


Registers:
RAX=0x0000000000000000, RBX=0x00007f06edc39d00, RCX=0x0000000000028318, RDX=0x00007f06edc39d00
RSP=0x00007f06edbe6458, RBP=0x00007f06edd78f50, RSI=0x0000000000000000, RDI=0x00007f06edc39d00
R8 =0x0000000000004f9a, R9 =0x0000000000000001, R10=0x00000000000000c3, R11=0x00007f06e47c9840
R12=0x00007f06edc119e0, R13=0x00007f06edbe96c0, R14=0x0000000000028318, R15=0x0000000000005063
RIP=0x00007f06ed243291, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000007
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f06edbe6458)
0x00007f06edbe6458:   00007f0673c89bc4 7b8f04462509c62f
0x00007f06edbe6468:   8010180048120140 0000c12912a02890
0x00007f06edbe6478:   0460229080441000 ffffffffffffffff
0x00007f06edbe6488:   4a03ed807b023001 3040120080800100

Steps to reproduce

The command ran was

gatk  HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam \
    -O AlignedCalToCcl_Scaffolds.vcf.gz \
    -ERC GVCF

Submitted to an HPC cluster using Slurm. Multiple machines tested, one Intel with an Xeon CPU E5-2683 v4 CPU and additionally tested on AMD with an EPYC 7713 CPU.

This has also been run multiple times, all crashing at the same __memset_avx2_erms+0x11 instruction.

Other package versions that might be relevant: java/17.0.2 glibc-common-2.28-225

If any more information is needed from me or the user, please let me know :)

emwjacobson avatar Sep 30 '24 17:09 emwjacobson