gatk icon indicating copy to clipboard operation
gatk copied to clipboard

mutect2 4.2.6.1 fails in gvcf mode

Open jkobject opened this issue 3 years ago • 6 comments
trafficstars

Hello GATK team!

Bug Report

Affected tool(s) or class(es)

mutect2

Affected version(s)

  • Latest public release version: 4.2.6.1

Description

getting fail for all scatter task with the argument "--emit-ref-confidence GVCF", no fails without it.

error:

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetSampleName -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://cclebams/hg38_wes/CDS-ce3y1s.hg38.bam -O tumor_name.txt -encode --gcs-project-for-requester-pays broad-firecloud-ccle
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.b3fd1830
14:13:40.205 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:13:40.275 INFO Mutect2 - ------------------------------------------------------------
14:13:40.276 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.6.1
14:13:40.277 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
14:13:40.277 INFO Mutect2 - Executing as root@0b46ce3a6ba5 on Linux v5.10.107+ amd64
14:13:40.277 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
14:13:40.278 INFO Mutect2 - Start Date/Time: May 13, 2022 2:13:40 PM GMT
14:13:40.278 INFO Mutect2 - ------------------------------------------------------------
14:13:40.278 INFO Mutect2 - ------------------------------------------------------------
14:13:40.279 INFO Mutect2 - HTSJDK Version: 2.24.1
14:13:40.280 INFO Mutect2 - Picard Version: 2.27.1
14:13:40.284 INFO Mutect2 - Built for Spark Version: 2.4.5
14:13:40.284 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:13:40.284 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:13:40.285 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:13:40.285 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:13:40.285 INFO Mutect2 - Deflater: IntelDeflater
14:13:40.285 INFO Mutect2 - Inflater: IntelInflater
14:13:40.286 INFO Mutect2 - GCS max retries/reopens: 20
14:13:40.286 INFO Mutect2 - Requester pays: enabled. Billed to: broad-firecloud-ccle
14:13:40.286 INFO Mutect2 - Initializing engine
14:13:46.660 INFO FeatureManager - Using codec VCFCodec to read file gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz
14:13:48.823 INFO FeatureManager - Using codec VCFCodec to read file gs://gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz
14:13:54.570 INFO FeatureManager - Using codec IntervalListCodec to read file gs://fc-secure-76d1542e-1c49-4411-8268-e41e92f9f311/729d209c-0ef4-409f-b3af-2e84ff45ee36/omics_mutect2/16911ef5-efb2-4e12-86f2-f3d5a54b28c0/call-mutect2/Mutect2/4e4a27e2-6c57-40e9-8ddc-1024bdcc50c1/call-SplitIntervals/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list
14:13:55.076 INFO IntervalArgumentCollection - Processing 308828640 bp from intervals
14:13:55.233 INFO Mutect2 - Done initializing engine
14:13:56.023 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:13:56.039 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:13:56.116 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:13:56.122 INFO IntelPairHmm - Available threads: 1
14:13:56.123 INFO IntelPairHmm - Requested threads: 4
14:13:56.123 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
14:13:56.127 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:13:56.302 WARN Mutect2 - Note that the Mutect2 reference confidence mode is in BETA -- the likelihoods model and output format are subject to change in subsequent versions.
14:13:56.492 INFO ProgressMeter - Starting traversal
14:13:56.493 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
14:14:08.796 INFO ProgressMeter - chr1:16085 0.2 60 292.6
14:14:09.377 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.008674977
14:14:09.378 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.28976746200000003
14:14:09.378 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 1.41 sec
14:14:09.384 INFO Mutect2 - Shutting down engine
[May 13, 2022 2:14:09 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.49 minutes.
Runtime.totalMemory()=850644992
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:422)
at java.util.ArrayList.get(ArrayList.java:435)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.lambda$getGermlineAltAlleleFrequencies$27(SomaticGenotypingEngine.java:376)
at java.util.stream.ReferencePipeline$6$1.accept(ReferencePipeline.java:244)
at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.DoublePipeline.toArray(DoublePipeline.java:530)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getGermlineAltAlleleFrequencies(SomaticGenotypingEngine.java:377)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getNegativeLogPopulationAFAnnotation(SomaticGenotypingEngine.java:354)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:161)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:283)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:300)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar Mutect2 -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://cclebams/hg38_wes/CDS-ce3y1s.hg38.bam -tumor HAP1_1 --germline-resource gs://gatk-best-practices/somatic-hg38/af-only-gnomad.hg38.vcf.gz -pon gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz -L gs://fc-secure-76d1542e-1c49-4411-8268-e41e92f9f311/729d209c-0ef4-409f-b3af-2e84ff45ee36/omics_mutect2/16911ef5-efb2-4e12-86f2-f3d5a54b28c0/call-mutect2/Mutect2/4e4a27e2-6c57-40e9-8ddc-1024bdcc50c1/call-SplitIntervals/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list -O output.vcf.gz --f1r2-tar-gz f1r2.tar.gz --genotype-germline-sites true --genotype-pon-sites true --emit-ref-confidence GVCF --gcs-project-for-requester-pays broad-firecloud-ccle

Steps to reproduce

running the same pipeline as described in previous issues: #7492

But I have added "--genotype-germline-sites true --genotype-pon-sites true --emit-ref-confidence GVCF" as additional args. the rest of the arguments are defaults/basic from the mutect2.wdl pipeline.

jkobject avatar May 13 '22 14:05 jkobject

this happens while running the command

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx15500m\ 
-jar /root/gatk.jar Mutect2 -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta\ 
-I gs://cclebams/hg38_wes/CDS-00rz9N.hg38.bam -tumor BC1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE --germline-resource gs://gcp-public-data--gnomad/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.vcf.bgz\ 
-pon gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz\ 
-L gs://fc-secure-d2a2d895-a7af-4117-bdc7-652d7d268324/7a157f4a-7d93-4a3e-aaf4-c41833463f5a/Mutect2/3be8ce8e-1075-4063-bc43-6f61e386c3f5/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list\ 
-O output.vcf.gz --f1r2-tar-gz f1r2.tar.gz --gcs-project-for-requester-pays broad-firecloud-ccle --genotype-germline-sites true --genotype-pon-sites true --emit-ref-confidence GVCF

jkobject avatar May 17 '22 14:05 jkobject

@davidbenjamin / @ldgauthier, thoughts on this one?

droazen avatar May 23 '22 19:05 droazen

Hello, Any updates on this?

This seems like an important debug because most people that would want to look at germlines would likely also be interested in having a GVCF format.

jkobject avatar Jun 06 '22 14:06 jkobject

@jkobject Do you see the error without --genotype-germline-sites and --genotype-pon-sites?

davidbenjamin avatar Jun 13 '22 20:06 davidbenjamin

I am getting the same error...

it might be caused by something else?

jkobject avatar Jun 14 '22 19:06 jkobject

Hello @davidbenjamin,

Any news on this?

Can you reproduce on your end?

jkobject avatar Jun 19 '22 21:06 jkobject

Hi @davidbenjamin and @ldgauthier,

I'm running into this same problem with version 4.4.0.0 in tumor only mode (actually creating a panel of normal). As far as I understand "--emit-ref-confidence GVCF" is required for compatibility with GenomicsDBImport? Is there some workaround?

1 Using GATK jar /scicore/soft/apps/GATK/4.4.0.0-GCCcore-10.3.0-Java-17/gatk-package-4.4.0.0-local.jar 2 Running: 3 java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.4.0.0-GCCcore-10.3.0-Java-17/gatk-package-4.4.0.0-local.jar Mutect2 -R /ref_nobackup/Homo_sapiens_assembly38.fasta -I /BQSR/BSSE_QGF_229563_bqsr.bam --emit-ref-confidence GVCF --germline-resource /ref_nobackup/af-only-gnomad.hg38.vcf.gz -max-mnp-distance 0 -O /output/BSSE_QGF_229563_pon.g.vcf.gz --tmp-dir /scratch 4 15:07:52.399 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.4.0.0-GCCcore-10.3.0-Java-17/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 5 15:07:52.435 INFO Mutect2 - ------------------------------------------------------------ 6 15:07:52.437 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.4.0.0 7 15:07:52.438 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/ 8 15:07:52.438 INFO Mutect2 - Executing as X on Linux v3.10.0-1160.66.1.el7.x86_64 amd64 9 15:07:52.438 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10 10 15:07:52.438 INFO Mutect2 - Start Date/Time: June 19, 2023 at 3:07:52 PM CEST 11 15:07:52.438 INFO Mutect2 - ------------------------------------------------------------ 12 15:07:52.438 INFO Mutect2 - ------------------------------------------------------------ 13 15:07:52.439 INFO Mutect2 - HTSJDK Version: 3.0.5 14 15:07:52.440 INFO Mutect2 - Picard Version: 3.0.0 15 15:07:52.440 INFO Mutect2 - Built for Spark Version: 3.3.1 16 15:07:52.440 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 17 15:07:52.441 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 18 15:07:52.441 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 19 15:07:52.442 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 20 15:07:52.442 INFO Mutect2 - Deflater: IntelDeflater 21 15:07:52.442 INFO Mutect2 - Inflater: IntelInflater 22 15:07:52.442 INFO Mutect2 - GCS max retries/reopens: 20 23 15:07:52.443 INFO Mutect2 - Requester pays: disabled 24 15:07:52.443 INFO Mutect2 - Initializing engine 25 15:07:52.848 INFO FeatureManager - Using codec VCFCodec to read file file://ref_nobackup/af-only-gnomad.hg38.vcf.gz 26 15:07:53.126 INFO Mutect2 - Done initializing engine 27 15:07:53.196 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/scicore/soft/apps/GATK/4.4.0.0-GCCcore-10.3.0-Java-17/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 28 15:07:53.201 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/scicore/soft/apps/GATK/4.4.0.0-GCCcore-10.3.0-Java-17/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so 29 15:07:53.223 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM 30 15:07:53.223 INFO IntelPairHmm - Available threads: 2 31 15:07:53.224 INFO IntelPairHmm - Requested threads: 4 32 15:07:53.224 WARN IntelPairHmm - Using 2 available threads, but 4 were requested 33 15:07:53.224 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation 34 15:07:53.231 WARN Mutect2 - Note that the Mutect2 reference confidence mode is in BETA -- the likelihoods model and output format are subject to change in subsequent versions. 35 15:07:53.314 INFO ProgressMeter - Starting traversal 36 15:07:53.314 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute 37 15:07:54.410 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 1.8392900000000002E-4 38 15:07:54.412 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.03020143 39 15:07:54.412 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.05 sec 40 15:07:54.413 INFO Mutect2 - Shutting down engine 41 [June 19, 2023 at 3:07:54 PM CEST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes. 42 Runtime.totalMemory()=285212672 43 java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 1 44 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) 45 at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) 46 at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) 47 at java.base/java.util.Objects.checkIndex(Objects.java:359) 48 at java.base/java.util.ArrayList.get(ArrayList.java:427) 49 at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.lambda$getGermlineAltAlleleFrequencies$27(SomaticGenotypingEngine.java:389) 50 at java.base/java.util.stream.ReferencePipeline$6$1.accept(ReferencePipeline.java:248) 51 at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:200) 52 at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) 53 at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) 54 at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) 55 at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) 56 at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) 57 at java.base/java.util.stream.DoublePipeline.toArray(DoublePipeline.java:571) 58 at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getGermlineAltAlleleFrequencies(SomaticGenotypingEngine.java:390) 59 at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.getNegativeLogPopulationAFAnnotation(SomaticGenotypingEngine.java:363) 60 at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:166) 61 at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:336) 62 at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:304) 63 at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200) 64 at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173) 65 at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098) 66 at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) 67 at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) 68 at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) 69 at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) 70 at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) 71 at org.broadinstitute.hellbender.Main.main(Main.java:289)

Trulsson avatar Jun 19 '23 13:06 Trulsson

@jkobject Just to be clear, we hardly ever recommend making a custom panel of normals. If you do wish to make one, we recommend running the mutect2_pon.wdl script in the GATK github. In that script Mutect2 is run in regular VCF mode, not GVCF mode, which as far as Mutect2 is concerned most users should never need to deal with. Finally, GenomicsDBImport can't handle MNPs so you will need to set --max-mnp-distance 0. But if possible it's far easier to run our WDL.

davidbenjamin avatar Jun 29 '23 15:06 davidbenjamin