gatk
gatk copied to clipboard
VCF row validation error on gCNV results
https://github.com/broadinstitute/gatk/blob/c6daf7dd02b866907fbfebad150baeb540c35bce/src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java#L701
I'm running into a recurrent issue in JointGermlineCNVSegmentation, running after PostprocessGermlineCNVCalls in a gCNV pipeline. A number of batches are being merged in parallel - some of those succeed, some fail. It's not clear just yet if this is a deterministic failure, I'll re-run a few times and see if I can answer that.
org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at chrX:6383391 [VC SAMPLE_ID.segments.vcf.gz @ chrX:6383391-17732942 Q3076.53 of type=NO_VARIATION alleles=[N*] attr={END=17732942} GT=GT:CN:NP:QA:QS:QSE:QSS 0:1:581:1:3077:4:20 filters=
...
Caused by: java.lang.IllegalStateException: Encountered genotype with ploidy 1 but 2 alleles.
at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:814)
at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.correctGenotypePloidy(JointGermlineCNVSegmentation.java:701)
at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.prepareGenotype(JointGermlineCNVSegmentation.java:682)
The VCF row in question is
chrX 6383391 CNV_chrX_6383391_17732942 N . 3076.53 . END=17732942 GT:CN:NP:QA:QS:QSE:QSS 0:1:581:1:3077:4:20
The characterisation of this row as type=NO_VARIATION alleles=[N*] seems... partially correct? There is no variation at this locus, but I'm not sure why alleles is N*.
In this situation, as I read it, the first clause should be satisfied: 1 allele, and allele is no-call. Instead the variant process is dying in the else side of the condition. Could you clarify if I'm interpreting this correctly?
Relevant versioning:
13:18:38.320 INFO JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.321 INFO JointGermlineCNVSegmentation - The Genome Analysis Toolkit (GATK) v4.2.6.1-57-g9e03432-SNAPSHOT
13:18:38.321 INFO JointGermlineCNVSegmentation - For support and documentation go to https://software.broadinstitute.org/gatk/
13:18:38.321 INFO JointGermlineCNVSegmentation - Executing as root@hostname-2559a32a6e on Linux v5.19.0-1030-gcp amd64
13:18:38.321 INFO JointGermlineCNVSegmentation - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
13:18:38.322 INFO JointGermlineCNVSegmentation - Start Date/Time: March 19, 2024 1:18:37 PM GMT
13:18:38.322 INFO JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.322 INFO JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.343 INFO JointGermlineCNVSegmentation - HTSJDK Version: 2.24.1
13:18:38.343 INFO JointGermlineCNVSegmentation - Picard Version: 2.27.1
13:18:38.343 INFO JointGermlineCNVSegmentation - Built for Spark Version: 2.4.5