gnomad-browser
gnomad-browser copied to clipboard
Display AS VQSR culprit information in site quality metrics table
An external user reached out to Mark for guidance on how to interpret variants that are flagged as failing our variant QC model (AS_VQSR
). After some discussion, we came to the conclusion that it could be helpful to include a designation for which annotation was marked AS_culprit
for variants that are flagged with AS_VQSR
.
The public exomes and genomes release HTs include this annotation in the vqsr_results
struct:
'vqsr_results': struct {
AS_VQSLOD: float64,
AS_culprit: str,
positive_train_site: bool,
negative_train_site: bool
}
For example, this variant fails AS_VQSR in the v4 genomes, and checking the v4 genomes release HT shows that the AS_culprit
for this variant is AS_ReadPosRankSum
:
>>> ht = hl.read_table('gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht')
>>> ht = hl.filter_intervals(ht, [hl.parse_locus_interval('chr11:747571-747572', reference_genome='GRCh38')])
>>> ht.vqsr_results.show()
+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus | alleles | vqsr_results.AS_VQSLOD | vqsr_results.AS_culprit | vqsr_results.positive_train_site |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus<GRCh38> | array<str> | float64 | str | bool |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| chr11:747571 | ["C","T"] | -7.48e+00 | "AS_ReadPosRankSum" | False |
+---------------+------------+------------------------+-------------------------+----------------------------------+
+----------------------------------+
| vqsr_results.negative_train_site |
+----------------------------------+
| bool |
+----------------------------------+
| True |
+----------------------------------+
Would it be possible to visually designate which metric is the AS_culprit in the Site Quality Metrics table (e.g., bolding, adding an asterisk, changing text color, etc)?
Also, for additional context, the descriptions of these fields are:
vqsr_results: VQSR related variant annotations.
- AS_VQSLOD: Allele-specific log-odds ratio of being a true variant versus being a false positive under the trained VQSR Gaussian mixture model.
- AS_culprit: Allele-specific worst-performing annotation in the VQSR Gaussian mixture model.
- positive_train_site: Variant was used to build the positive training set of high-quality variants for VQSR.
- negative_train_site: Variant was used to build the negative training set of low-quality variants for VQSR.