gatk icon indicating copy to clipboard operation
gatk copied to clipboard

IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1

Open AJDCiarla opened this issue 2 years ago • 9 comments

IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1

Looks like there are similar issues occurring in #7639 and #7933. This is a follow up report from the GATK Forum.

GATK Forum Post: (https://gatk.broadinstitute.org/hc/en-us/community/posts/6972994559643-java-lang-IllegalStateException-in-GenotypeGVCFs-after-GenomicsDBImport-GATK-4-2-6-1)


Bug Report

Tools/Methods

GenotypeGVCFs --> GenomicsDBImport

Affected version(s)

-GenomicsDBImport: GATK 4.2.4.0 -GenotypeGVCFs: GATK 4.2.6.1

Description

IllegalStateException being thrown in GenotypeGVCFs after GenomicsDBImport. Exception denotes that "genome has no likelihoods". User is dividing into 50 intervals.

Stacktrace:

GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),74.14547183399837,Cpu time(s),67.38693261000097
[July 1, 2022 1:36:56 AM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 104.22 minutes.
Runtime.totalMemory()=13973323776
java.lang.IllegalStateException: Genotype has no likelihoods: [COLI1040 TGAGC*/T GQ 39 DP 2 AD 1,1 {SB=[1, 0, 1, 0]}]
    at org.broadinstitute.hellbender.utils.GenotypeUtils.computeDiploidGenotypeCounts(GenotypeUtils.java:89)
    at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.calculateEH(ExcessHet.java:96)
    at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.annotate(ExcessHet.java:84)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:355)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:306)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.regenotypeVC(GenotypeGVCFsEngine.java:185)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:135)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$null$1(VariantLocusWalker.java:161)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$2(VariantLocusWalker.java:151)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:148)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

Exact Commands Used:

GenomicsDBImport: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2G -Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenomicsDBImport --genomicsdb-workspace-path 007_Database_DBImport_VCFref/database_interval_9 --sample-name-map sample_name_map --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --reader-threads 5 --batch-size 60 --tmp-dir TMPDIR --max-num-intervals-to-import-in-parallel 3 --merge-input-intervals

GenotypeGVCFs: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4G -Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenotypeGVCFs -R PigeonBatch5/000_DataLinks/000_RefSeq/Cliv2.1_genomic.fasta --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --force-output-intervals PigeonBatch4/008_RawVcfGz/MergeVcf/pigeonBatch1234_filtered.vcf.gz -V gendb://007_Database_DBImport_VCFref/database_interval_9 -O 008_RawVcfGz_DBImport_VCFref/001_DividedIntervals/interval_9.vcf.gz --tmp-dir TMPDIR --allow-old-rms-mapping-quality-annotation-data --only-output-calls-starting-in-intervals --verbosity ERROR

User Description of the Issue:

"I'm using the GenotypeGVCFs function based on GenomicsDBImport database. I've divided the reference into 50 intervals. Some intervals seems ok, but some reports error as following.

I used a VCF file in "--force-output-intervals" for down stream analysis. I've never seen this error without "--force-output-intervals". I've searched for the error message and changed my GATK version to 4.2.6.1 since similar error has been solved as a bug in recent update, but it still not works on my dataset..."

@droazen and @samuelklee , any insight on this?

Thank you,

Anthony

AJDCiarla avatar Jul 12 '22 19:07 AJDCiarla

Just reiterating here what @lbergelson noted in office hours: looks like the offending check was added in https://github.com/broadinstitute/gatk/pull/7738, which ultimately affects both the ExcessHet and InbreedingCoeff annotations. @droazen reviewed that PR and might have more insight as to the desired behavior for these annotations when we are missing PLs due to GenomicsDB dropping them upstream---should we just not emit these annotations?

samuelklee avatar Jul 14 '22 14:07 samuelklee

@AJDCiarla The user should try re-running GenotypeGVCFs with --max-genotype-count set to a value greater than 1024. This should prevent the PLs from getting dropped and avoid the downstream error. The user may also need to increase --max-alternate-alleles as well.

droazen avatar Jul 18 '22 16:07 droazen

@AJDCiarla It would also be useful to know whether the error occurs when the user runs GenotypeGVCFs without the --force-output-intervals argument.

droazen avatar Jul 18 '22 19:07 droazen

@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.

I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.

Have you had any further discussions beyond what @samuelklee suggested above?

bbimber avatar Jul 24 '22 00:07 bbimber

@droazen, I'm running a job using a JAR based on #7962 and it progressed beyond the previous failures.

bbimber avatar Jul 26 '22 16:07 bbimber

We can close this, I have created a new ticket here #7966 for the --force-output-intervals bug. @droazen

AJDCiarla avatar Jul 28 '22 18:07 AJDCiarla

@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.

I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.

Have you had any further discussions beyond what @samuelklee suggested above?

Hello, did you deal with this probelm, I also encounter this.

jigaoxiang avatar Sep 07 '22 04:09 jigaoxiang

The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.

bbimber avatar Sep 07 '22 04:09 bbimber

The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.

Awesome! It is useful. Thank you very much!

jigaoxiang avatar Sep 07 '22 04:09 jigaoxiang