gatk icon indicating copy to clipboard operation
gatk copied to clipboard

VCF row validation error on gCNV results

Open MattWellie opened this issue 1 year ago • 9 comments

https://github.com/broadinstitute/gatk/blob/c6daf7dd02b866907fbfebad150baeb540c35bce/src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java#L701

I'm running into a recurrent issue in JointGermlineCNVSegmentation, running after PostprocessGermlineCNVCalls in a gCNV pipeline. A number of batches are being merged in parallel - some of those succeed, some fail. It's not clear just yet if this is a deterministic failure, I'll re-run a few times and see if I can answer that.

org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at chrX:6383391 [VC SAMPLE_ID.segments.vcf.gz @ chrX:6383391-17732942 Q3076.53 of type=NO_VARIATION alleles=[N*] attr={END=17732942} GT=GT:CN:NP:QA:QS:QSE:QSS	0:1:581:1:3077:4:20 filters=

...

Caused by: java.lang.IllegalStateException: Encountered genotype with ploidy 1 but 2 alleles.
	at org.broadinstitute.hellbender.utils.Utils.validate(Utils.java:814)
	at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.correctGenotypePloidy(JointGermlineCNVSegmentation.java:701)
	at org.broadinstitute.hellbender.tools.walkers.sv.JointGermlineCNVSegmentation.prepareGenotype(JointGermlineCNVSegmentation.java:682)

The VCF row in question is

chrX	6383391	CNV_chrX_6383391_17732942	N	.	3076.53	.	END=17732942	GT:CN:NP:QA:QS:QSE:QSS	0:1:581:1:3077:4:20

The characterisation of this row as type=NO_VARIATION alleles=[N*] seems... partially correct? There is no variation at this locus, but I'm not sure why alleles is N*.

In this situation, as I read it, the first clause should be satisfied: 1 allele, and allele is no-call. Instead the variant process is dying in the else side of the condition. Could you clarify if I'm interpreting this correctly?

Relevant versioning:

13:18:38.320 INFO  JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.321 INFO  JointGermlineCNVSegmentation - The Genome Analysis Toolkit (GATK) v4.2.6.1-57-g9e03432-SNAPSHOT
13:18:38.321 INFO  JointGermlineCNVSegmentation - For support and documentation go to https://software.broadinstitute.org/gatk/
13:18:38.321 INFO  JointGermlineCNVSegmentation - Executing as root@hostname-2559a32a6e on Linux v5.19.0-1030-gcp amd64
13:18:38.321 INFO  JointGermlineCNVSegmentation - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
13:18:38.322 INFO  JointGermlineCNVSegmentation - Start Date/Time: March 19, 2024 1:18:37 PM GMT
13:18:38.322 INFO  JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.322 INFO  JointGermlineCNVSegmentation - ------------------------------------------------------------
13:18:38.343 INFO  JointGermlineCNVSegmentation - HTSJDK Version: 2.24.1
13:18:38.343 INFO  JointGermlineCNVSegmentation - Picard Version: 2.27.1
13:18:38.343 INFO  JointGermlineCNVSegmentation - Built for Spark Version: 2.4.5

MattWellie avatar May 15 '24 06:05 MattWellie