gnomad-browser icon indicating copy to clipboard operation
gnomad-browser copied to clipboard

Display AS VQSR culprit information in site quality metrics table

Open ch-kr opened this issue 7 months ago • 0 comments

An external user reached out to Mark for guidance on how to interpret variants that are flagged as failing our variant QC model (AS_VQSR). After some discussion, we came to the conclusion that it could be helpful to include a designation for which annotation was marked AS_culprit for variants that are flagged with AS_VQSR.

The public exomes and genomes release HTs include this annotation in the vqsr_results struct:

    'vqsr_results': struct {
        AS_VQSLOD: float64,
        AS_culprit: str,
        positive_train_site: bool,
        negative_train_site: bool
    }

For example, this variant fails AS_VQSR in the v4 genomes, and checking the v4 genomes release HT shows that the AS_culprit for this variant is AS_ReadPosRankSum:

>>> ht = hl.read_table('gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht')
>>> ht = hl.filter_intervals(ht, [hl.parse_locus_interval('chr11:747571-747572', reference_genome='GRCh38')])
>>> ht.vqsr_results.show()

+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus         | alleles    | vqsr_results.AS_VQSLOD | vqsr_results.AS_culprit | vqsr_results.positive_train_site |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus<GRCh38> | array<str> |                float64 | str                     |                             bool |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| chr11:747571  | ["C","T"]  |              -7.48e+00 | "AS_ReadPosRankSum"     |                            False |
+---------------+------------+------------------------+-------------------------+----------------------------------+

+----------------------------------+
| vqsr_results.negative_train_site |
+----------------------------------+
|                             bool |
+----------------------------------+
|                             True |
+----------------------------------+

Would it be possible to visually designate which metric is the AS_culprit in the Site Quality Metrics table (e.g., bolding, adding an asterisk, changing text color, etc)? image

Also, for additional context, the descriptions of these fields are:

vqsr_results: VQSR related variant annotations.

    - AS_VQSLOD: Allele-specific log-odds ratio of being a true variant versus being a false positive under the trained VQSR Gaussian mixture model.
    - AS_culprit: Allele-specific worst-performing annotation in the VQSR Gaussian mixture model.
    - positive_train_site: Variant was used to build the positive training set of high-quality variants for VQSR.
    - negative_train_site: Variant was used to build the negative training set of low-quality variants for VQSR.

ch-kr avatar Jul 24 '24 16:07 ch-kr