gatk
gatk copied to clipboard
IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1
IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1
Looks like there are similar issues occurring in #7639 and #7933. This is a follow up report from the GATK Forum.
GATK Forum Post: (https://gatk.broadinstitute.org/hc/en-us/community/posts/6972994559643-java-lang-IllegalStateException-in-GenotypeGVCFs-after-GenomicsDBImport-GATK-4-2-6-1)
Bug Report
Tools/Methods
GenotypeGVCFs --> GenomicsDBImport
Affected version(s)
-GenomicsDBImport: GATK 4.2.4.0 -GenotypeGVCFs: GATK 4.2.6.1
Description
IllegalStateException being thrown in GenotypeGVCFs after GenomicsDBImport. Exception denotes that "genome has no likelihoods". User is dividing into 50 intervals.
Stacktrace:
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),74.14547183399837,Cpu time(s),67.38693261000097
[July 1, 2022 1:36:56 AM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 104.22 minutes.
Runtime.totalMemory()=13973323776
java.lang.IllegalStateException: Genotype has no likelihoods: [COLI1040 TGAGC*/T GQ 39 DP 2 AD 1,1 {SB=[1, 0, 1, 0]}]
at org.broadinstitute.hellbender.utils.GenotypeUtils.computeDiploidGenotypeCounts(GenotypeUtils.java:89)
at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.calculateEH(ExcessHet.java:96)
at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.annotate(ExcessHet.java:84)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:355)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:306)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.regenotypeVC(GenotypeGVCFsEngine.java:185)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:135)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$null$1(VariantLocusWalker.java:161)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$2(VariantLocusWalker.java:151)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:148)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Exact Commands Used:
GenomicsDBImport:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2G -Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenomicsDBImport --genomicsdb-workspace-path 007_Database_DBImport_VCFref/database_interval_9 --sample-name-map sample_name_map --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --reader-threads 5 --batch-size 60 --tmp-dir TMPDIR --max-num-intervals-to-import-in-parallel 3 --merge-input-intervals
GenotypeGVCFs:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4G -Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenotypeGVCFs -R PigeonBatch5/000_DataLinks/000_RefSeq/Cliv2.1_genomic.fasta --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --force-output-intervals PigeonBatch4/008_RawVcfGz/MergeVcf/pigeonBatch1234_filtered.vcf.gz -V gendb://007_Database_DBImport_VCFref/database_interval_9 -O 008_RawVcfGz_DBImport_VCFref/001_DividedIntervals/interval_9.vcf.gz --tmp-dir TMPDIR --allow-old-rms-mapping-quality-annotation-data --only-output-calls-starting-in-intervals --verbosity ERROR
User Description of the Issue:
"I'm using the GenotypeGVCFs function based on GenomicsDBImport database. I've divided the reference into 50 intervals. Some intervals seems ok, but some reports error as following.
I used a VCF file in "--force-output-intervals" for down stream analysis. I've never seen this error without "--force-output-intervals". I've searched for the error message and changed my GATK version to 4.2.6.1 since similar error has been solved as a bug in recent update, but it still not works on my dataset..."
@droazen and @samuelklee , any insight on this?
Thank you,
Anthony
Just reiterating here what @lbergelson noted in office hours: looks like the offending check was added in https://github.com/broadinstitute/gatk/pull/7738, which ultimately affects both the ExcessHet and InbreedingCoeff annotations. @droazen reviewed that PR and might have more insight as to the desired behavior for these annotations when we are missing PLs due to GenomicsDB dropping them upstream---should we just not emit these annotations?
@AJDCiarla The user should try re-running GenotypeGVCFs
with --max-genotype-count
set to a value greater than 1024. This should prevent the PLs from getting dropped and avoid the downstream error. The user may also need to increase --max-alternate-alleles
as well.
@AJDCiarla It would also be useful to know whether the error occurs when the user runs GenotypeGVCFs
without the --force-output-intervals
argument.
@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.
I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.
Have you had any further discussions beyond what @samuelklee suggested above?
@droazen, I'm running a job using a JAR based on #7962 and it progressed beyond the previous failures.
We can close this, I have created a new ticket here #7966 for the --force-output-intervals bug. @droazen
@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.
I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.
Have you had any further discussions beyond what @samuelklee suggested above?
Hello, did you deal with this probelm, I also encounter this.
The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.
The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.
Awesome! It is useful. Thank you very much!