gatk icon indicating copy to clipboard operation
gatk copied to clipboard

java.lang.IllegalStateException in JointGermlineCNVSegmentation

Open holtgrewe opened this issue 2 years ago • 4 comments

Bug Report

Affected tool(s) or class(es)

JointGermlineCNVSegmentation

Affected version(s)

  • [x] Latest public release version [v4.3.0.0]
  • [ ] Latest master branch as of [date of test?]

Description

I get the following exception when running JointGermlineCNVSegmentation on an exome trio dataset:

[January 19, 2023 at 6:59:29 AM CET] org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation done. Elapsed time: 0.82 minutes.
Runtime.totalMemory()=300941312
java.lang.IllegalStateException: Encountered genotype with ploidy 0 but 1 alleles.
        at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:814)
        at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.correctGenotypePloidy(JointGermlineCNVSegmentation.java:701)
        at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.prepareGenotype(JointGermlineCNVSegmentation.java:682)
        at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.lambda$createDepthOnlyFromGCNVWithOriginalGenotypes$4(JointGermlineCNVSegmentation.java:666)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.createDepthOnlyFromGCNVWithOriginalGenotypes(JointGermlineCNVSegmentation.java:667)
        at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.apply(JointGermlineCNVSegmentation.java:280)
        at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:133)
        at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.afterTraverse(MultiVariantWalkerGroupedOnStart.java:193)
        at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:166)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Steps to reproduce

gatk JointGermlineCNVSegmentation --reference hs37d5.fa --variant index.vcf.gz --variant father.vcf.gz --variant mother.vcf.gz --model-call-intervals gcnv_preprocess_intervals.Agilent_SureSelect_Human_All_Exon_V6.interval_list --pedigree family.ped --output out.vcf.gz

The input VCF lines look as follows:

## index
Y     2654827 CNV_Y_2654827_24461230  N       .       3076.53 .       END=24461230    GT:CN:NP:QA:QS:QSE:QSS  .:0:220:94:3077:472:1358
## father
Y     2654827 CNV_Y_2654827_24461230  N       .       3076.53 .       END=24461230    GT:CN:NP:QA:QS:QSE:QSS  0:1:220:58:3077:105:376
## mother
Y     2654827 CNV_Y_2654827_24461230  N       <DEL>   3076.53 .       END=24461230    GT:CN:NP:QA:QS:QSE:QSS  1:0:220:29:3077:357:640

The call looks like an artifact in the BAM alignments. However, the contig ploidy for the mother looks ... interesting.

## index (sex assigned at birth: female)
CONTIG  PLOIDY  PLOIDY_GQ
X       2       123.51003746478007
Y       0       9.176757618621913
## father (sex assigned at birth: male)
CONTIG  PLOIDY  PLOIDY_GQ
X       1       123.5100374633715
Y       1       17.498503426830368
## mother (sex assigned at birth: female)
CONTIG  PLOIDY  PLOIDY_GQ
X       2       123.51003745758246
Y       1       0.09888866060944837

The sample of the mother has a slightly increased fraction of chrY reads when compared to other female samples but is far below the fraction of chrY reads that male samples have that were sequenced with the same kit.

There is an increase in the variance of alternate allele balance for het. sites in this sample as well. I assume that this sample has been contaminated with male DNA.

Expected behavior

I would like to be able to deactivate the hard error on the command line and replace it with a warning in the output logs.

Actual behavior

There is a hard crash that cannot be circumvented.

holtgrewe avatar Jan 19 '23 07:01 holtgrewe