gatk
gatk copied to clipboard
Request created from: Monomorphic sites after GenotypeGVCFs --include-non-variant-sites
This request was created from a contribution made by Diana Robledo on June 23, 2022 10:56 UTC.
--
Hello,
I am using GATKv4.2.6.1 and GATK best practices. I performed joint genotyping of a multi-sample GVCF with GenotypeGVCFs. Because I am doing a population genetic analysis I am very interested in obtaining high confidence monomorphic sites, so I included the option --include-non-variant-sites. In the output VCF, however, I find that there are 3 types of monomorphic sites, for example:
#CHROM POS ID REF ALT QUAL FILTER
HiC_scaffold_493 961 . A . . .
HiC_scaffold_493 962 . ATCTCCCC . 7.65 LowQual
HiC_scaffold_493 963 . T . 180.56 .
I am not sure what the differences between those 3 types of monomorphic sites are. I tracked down those positions in the input GVCF and they look like this:
#CHROM POS ID REF ALT QUAL FILTER
HiC_scaffold_493 961 . A <NON\_REF> . .
HiC_scaffold_493 962 . ATCTCCCC A,<NON\_REF> . .
HiC_scaffold_493 963 . T *,<NON\_REF> . .
I assume that in the GVCF, position 962 had some evidence of the presence of an alternative allele (A) but it was so poor (QUAL < 30) that it was discarded and the position was deemed as monomorphic in the VCF (LowQual). But what about position 963? There was some evidence of a deletion (*) as alternative allele in the GVCF but it got discarded in the VCF despite QUAL = 180.56?
Also, why does position 961 has no QUAL score at all? In fact, these are results from a small scaffold with 1,000 bp, of which 789 monomorphic sites have no QUAL score at all (like position 961).
This might be a rookie question but any help would be much appreciated!
Diana
(created from Zendesk ticket #289449)
gz#289449