gatk
gatk copied to clipboard
Problem in gvcf interplay between clair3 and gatk
I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:
image
The gVCF record of clair3 for this SNP looks as follows:
10 23761033 . G <NON_REF> 0 . END=23761033 GT:GQ:MIN_DP:PL ./.:0:68:245,0,1385
Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:
10 23761040 . T A,<NON_REF> 19.22 PASS F GT:GQ:DP:AD:AF:PL 0/1:19:68:27,38,0:0.5588:28,0,45,990,990,990
GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..
10 23761033 . G T 1093.24 . [...] GT:AD:AF:DP:GQ:PL 0/1:68,0:.:68:99:245,0,1385 [...]
Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.
I posted the issue at the clair3 page as well: https://github.com/HKU-BAL/Clair3/issues/354.
Thanks, Johannes