gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Getting QUAL and MQ values for invariant sites

Open elcortegano opened this issue 3 years ago • 2 comments

I am running gatk GenotypeGVCFs and want to get output for all genomic sites. I was expecting the output to include parameters like QUAL and mapping quality (MQ) for invariant sites. This is based on a previous study that used an earlier release of GATK, v2.8-1 (reference) that used QUAL and MQ values from that sites (although I am aware that these values are computed differently as for variant sites).

However, the version I am using, v4.2.1.0, seems to not produce this output, and I cannot find a relevant option to include it. Am I missing something? is there anyway to get this information?

GATK is run as:

gatk HaplotypeCaller -I sample1.bam -O sample1.vcf -R reference.fa -ploidy 1 -ERC BP_RESOLUTION -stand-call-conf 10.0
...
gatk CombineGVCFs -R reference.fa -O all_samples.g.vcf --variant sample1.vcf --variant sample2.vcf ...
gatk GenotypeGVCFs -R reference.fa -V all_samples.g.vcf -O all_samples.vcf -ploidy 1 -all-sites

Thank you

elcortegano avatar Sep 07 '21 19:09 elcortegano

Hi @elcortegano,

A lot of things changed in GATK3 with the advent of GVCFs. We don't store annotation data for reference sites in GVCFs, partly because in most cases sites will be combined into large blocks where the values may not be very accurate anymore and partly because we'd just rather save space. For basepair resolution we could output MQ values, we just don't have that capability at the moment (strand bias and rank sum annotations wouldn't be very interesting without alt reads). We can leave this issue open for that purpose.

You should be able to get QUAL scores for reference sites in GenotypeGVCFs -all-sites mode. @cmnbroad is that not part of our expected output?

ldgauthier avatar Sep 09 '21 16:09 ldgauthier

Old thread, but I'm still finding problems with QUAL values at invariant sites. Now I'm working with GATK 4.4.0, and after running GenotypeGVCFs -all-sites, I see that a only a few sites store the QUAL value. Ideally, I would have it for all sites.

Is there a way to calculate the QUAL value for all invariant sites? could a VCF be used as input for a tool to annotate missing QUAL values?, thanks again.

elcortegano avatar Feb 06 '24 17:02 elcortegano