gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Request created from: Monomorphic sites after GenotypeGVCFs --include-non-variant-sites

Open GATKSupportTeam opened this issue 2 years ago • 0 comments

This request was created from a contribution made by Diana Robledo on June 23, 2022 10:56 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/6808932798363-Monomorphic-sites-after-GenotypeGVCFs-include-non-variant-sites

--

Hello,

I am using GATKv4.2.6.1 and GATK best practices. I performed joint genotyping of a multi-sample GVCF with GenotypeGVCFs. Because I am doing a population genetic analysis I am very interested in obtaining high confidence monomorphic sites, so I included the option --include-non-variant-sites. In the output VCF, however, I find that there are 3 types of monomorphic sites, for example:

#CHROM                POS  ID   REF               ALT   QUAL     FILTER

HiC_scaffold_493    961    .    A                     .        .              .

HiC_scaffold_493    962    .    ATCTCCCC    .        7.65        LowQual

HiC_scaffold_493    963    .    T                     .        180.56    .

I am not sure what the differences between those 3 types of monomorphic sites are. I tracked down those positions in the input GVCF and they look like this:

#CHROM                     POS  ID     REF                    ALT                     QUAL     FILTER

HiC_scaffold_493        961     .       A                        <NON\_REF>        .            . 

HiC_scaffold_493        962     .       ATCTCCCC       A,<NON\_REF>     .            .

HiC_scaffold_493        963     .       T                         *,<NON\_REF>     .            .

I assume that in the GVCF, position 962 had some evidence of the presence of an alternative allele (A) but it was so poor (QUAL < 30) that it was discarded and the position was deemed as monomorphic in the VCF (LowQual). But what about position 963? There was some evidence of a deletion (*) as alternative allele in the GVCF but it got discarded in the VCF despite QUAL = 180.56?

Also, why does position 961 has no QUAL score at all? In fact, these are results from a small scaffold with 1,000 bp, of which 789 monomorphic sites have no QUAL score at all (like position 961).

This might be a rookie question but any help would be much appreciated!

Diana

(created from Zendesk ticket #289449)
gz#289449

GATKSupportTeam avatar Sep 22 '22 16:09 GATKSupportTeam