gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Possible bug in GenotypeGVCFs 4.6.0.0, affecting all genotyping applications

Open gevro opened this issue 4 months ago • 6 comments

Hi, It seems that for samples in which a variant was NOT detected in a cohort, that GenotypeGVCFs is putting read depth in the AD and DP FORMAT fields of those samples' gVCF MIN_DP fields, rather than AD and DP fields.

Example - after Combine GVCFs, here are two samples, one with (Sample1) and one without (Sample2) the variant detected: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 chr18 46641978 . C T,A,G,<NON_REF> . . BaseQRankSum=-0.507;DP=51737;ExcessHet=0;MQRankSum=0;RAW_MQandDP=184603201,51279;ReadPosRankSum=0.338;AC=0,0,0,0;AN=0 GT:AD:DP:GQ:MIN_DP:PL:SB ./.:516,917,0,0,0:1433:99:.:18707,0,8863,20253,11609,31862,20253,11609,31862,31862,20253,11609,31862,31862,31862:252,264,458,459 ./.:.:198:99:48:0,99,1307,99,1307,1307,99,1307,1307,1307,99,1307,1307,1307,1307:.

--> You can see that DP for Sample1 and Sample2 are 1433 and 198 respectively. And MIN_DP are '.' and 48 respectively. (Note, I'm not sure what MIN_DP = '.' means).

After GenotypeGVCFs, here are the results: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BTM-1900-PTA-1-C11_DNA BTM-1900-PTA-1-D11_DNA chr18 46641978 . C T 484450 . AC=1;AF=0.436;AN=4;BaseQRankSum=-0.507;DP=51737;ExcessHet=112.96;FS=0;InbreedingCoeff=-0.7722;MLEAC=61;MLEAF=0.436;MQ=60;MQRankSum=0;QD=9.73;ReadPosRankSum=0.338;SOR=0.669 GT:AD:DP:GQ:PL 0/1:516,917:1433:99:18707,0,8863 0/0:48,0:48:99:0,99,1307

--> DP here is 1433 for Sample1 (correct) and 48 for Sample2 (INCORRECT). DP for Sample2 should be equal to 198, and AD values are also wrong. It seems that GenotypeGVCFs is pulling from the MIN_DP field, which doesn't make sense.

This seems like a likely (quite serious) bug, unless I'm not understanding something fundamental about how GenotypeGVCFs works.

gevro avatar Oct 17 '24 21:10 gevro