gnomad-browser icon indicating copy to clipboard operation
gnomad-browser copied to clipboard

2 rare variants table on gene page

Open anneodonnell opened this issue 2 years ago • 3 comments

What you did:

Building off of Laurent's statistical phasing work, Sarah Stenton, Julia, and Kaitlin have been working on the two rare variant calculations (with different allele frequencies and variant types included). I'm envisioning this as a table that would be in the place of the constraint table - on a tab where you can toggle between the constraint or 2 rare variant data.

I've asked Sarah to start thinking about what data would be the most helpful to show (as the download version will likely have more subcategories than we will show on the browser). The data isn't quite ready yet so I'm just adding a ticket to have this on the development road map.

This is lower priority than gnomAD v4 but there is a chance it will be ready before v4 so if there is extra bandwidth, it's something to potentially start thinking about while waiting for the v4 data.

What happened:

anneodonnell avatar Jul 28 '22 15:07 anneodonnell

We'll add this to the roadmap -- thanks Anne

mattsolo1 avatar Jul 28 '22 17:07 mattsolo1

The two rare variant data is ready now and I have a mock-up of the browser display (which will be a subset of the full download table as a "main display table" and an "expanded display table" - with a button to expand) - as Anne wrote above, the table would be in the place of the constraint table - on a tab where you can toggle between the constraint or two rare variant data. I've added below some screen shots of how we envision it.

The data is here: gs://gnomad-sarah/compound_hets/chet_per_gene.tsv.gz

From the table we'd need to extract the "n_het_het" number for "individuals with two rare variants" and "n_chet" number for "number in trans" by allele frequency and csq to be displayed in the table on the browser. The consequence in the screen shot should correspond to the csq in the full table with "or worse", e.g., strong missense + strong missense would need to take the value in the row "strong_revel_missense**_or_worse**_strong_revel_missense**_or_worse**" (the only exception is lof_lof).

We've also added it to the agenda of the next gnomAD browser roadmapping meeting.

two_rare_variants_gnomad_browser.pdf

slstenton avatar Sep 08 '22 15:09 slstenton

  • [ ] Active table toggle
  • [ ] Data into dual-rare table
  • [ ] Dual-rare table collapse/expand toggle
  • [ ] Tooltips

phildarnowsky-broad avatar Oct 19 '22 12:10 phildarnowsky-broad

We're making some updates to the table to also count homozygous variant carriers -- this won't change the table structure but will change the counts in the cells. I hope we will have it ready in the coming week or two. We'd also like to move "missense + missense" from the expanded table to the main table view.

slstenton avatar Oct 28 '22 16:10 slstenton

For transparency, I am stalled on this and have been for roughly two days now, because as periodically happens, an important part of my dev environment has stopped working for no obvious reason.

phildarnowsky-broad avatar Oct 31 '22 16:10 phildarnowsky-broad

@slstenton I'll soon be ready to work with the data but I don't seem to have access to the storage bucket in question.

phildarnowsky-broad avatar Nov 09 '22 18:11 phildarnowsky-broad

@phildarnowsky-broad the data are in two places:

gs://gnomad-sarah/compound_hets/chet_per_gene.tsv.gz gs://gnomad/projects/compound_hets/exomes_chet_per_gene_0.05_3_prime_UTR_variant_vp.tsv.gz

Do you have access to either of these?

As an update from our side: We're currently re-thinking which numbers we'd like to display in the browser, and I'm also still working on adding in counts of homozygous variant carriers.

slstenton avatar Nov 09 '22 18:11 slstenton

@slstenton no, no luck with either of those URLs, I think @sjahl might be the person to ask about this (sorry if I got that wrong Steve).

For now then I'll populate the table with dummy data, and also I'll make sure to leave us a clear path to change the figures we show if we want.

phildarnowsky-broad avatar Nov 09 '22 20:11 phildarnowsky-broad

@phildarnowsky-broad Let me check that we don't have to do any paperwork for gnomad data access first, and then I can get you read access.

sjahl avatar Nov 09 '22 20:11 sjahl

@phildarnowsky-broad I've added the browser team members to the gnomad read access group. Try again?

sjahl avatar Nov 09 '22 20:11 sjahl

@sjahl /cc @slstenton thanks I have access now

phildarnowsky-broad avatar Nov 09 '22 20:11 phildarnowsky-broad

@sjahl @slstenton that is, I was able to download chet_per_gene.tsv.gz, but exomes_chet_per_gene_0.05_3_prime_UTR_variant_vp.tsv.gz is in a requester-pays bucket. Would these files be the same data, or will I need to take both and combine them?

phildarnowsky-broad avatar Nov 09 '22 20:11 phildarnowsky-broad

@phildarnowsky-broad the data is exactly the same.

slstenton avatar Nov 09 '22 20:11 slstenton

@slstenton excellent, we're good on that then.

After talking with the team a little, I changed my mind and I think it'd be best to build out the backend portions of this now as well, on the assumption that though there might be changes in the data you want to display or how you calculate it, the final choices will look more like what's in the mockup than not. The only question is, do you think that's a safe assumption?

phildarnowsky-broad avatar Nov 09 '22 20:11 phildarnowsky-broad

The changes we're discussing will likely involve:

  1. the consequence categories displayed in the "main" display table
  2. the number of different values displayed in the cells (we may add additional numbers e.g., homozygous counts in addition to the "in trans" counts, but we're still discussing what is best as we don't want the table to become too busy)

slstenton avatar Nov 09 '22 20:11 slstenton

@slstenton OK sounds good. I have two more questions at the moment about the data:

  1. How do the columns n_chet, n_same_hap, n_unphased, and n_het_het map to the two counts in each table cell?
  2. Spot checking, it looks like it may be that population is always all. Is this so? If it's not so, how does population enter into the calculation?

phildarnowsky-broad avatar Nov 09 '22 21:11 phildarnowsky-broad

The first number in front of the brackets should be "n_het_het" and within the brackets should be "n_chet". All others can be ignored. Population is always "all" in this analysis, so that can also be ignored.

slstenton avatar Nov 09 '22 21:11 slstenton

@slstenton One more question, does the data here cover all genes in gnomAD or only a subset?

phildarnowsky-broad avatar Nov 10 '22 16:11 phildarnowsky-broad

It runs on all genes, the only reason a gene would be missing is if there were no individuals with two rare variants counted in that gene, in which case the table should be populated with zeros.

slstenton avatar Nov 10 '22 16:11 slstenton

Brief update before I go on winter break: this is ready for code review apart from needing the tooltips and some assorted frontend styling fixes done. Also, the data pipeline in its current shape should work end-to-end, but I haven't yet tried it due to the amount of time it takes to run in full.

phildarnowsky-broad avatar Dec 22 '22 21:12 phildarnowsky-broad

@slstenton This is now in code review. I'm working on getting a demo instance set up that you can QA on, but in the meantime, here are some screenshots from my development machine.

(You can ignore the "Unknown error" message, that is due to an unrelated issue)

Image Image

phildarnowsky-broad avatar Jan 10 '23 16:01 phildarnowsky-broad

@phildarnowsky-broad thank you for the update -- the code generating the tables was also reviewed this week, I should have the final tables for the browser in the next days

slstenton avatar Jan 10 '23 16:01 slstenton

@slstenton sounds good, I will keep an eye out for those tables.

phildarnowsky-broad avatar Jan 10 '23 16:01 phildarnowsky-broad

@slstenton one small point regarding the tooltips, I'm told that if we leave off the "v1" from the end of the biorxiv URL, it'll automatically link to the newest version of the given paper, in case there are revised versions in the future. I'm guessing that's desirable, but if it's important that these link to a specific version of the paper rather than whatever the latest might be, please let me know.

phildarnowsky-broad avatar Jan 12 '23 16:01 phildarnowsky-broad

@phildarnowsky-broad that's great to know about biorxiv, I think it would be great to always link to the most recent version.

Here are the paths to the updated results tables:

For the two heterozygous rare variant counts: gs://gnomad-sarah/compound_hets/chet_unphased_same_hap_per_gene.tsv

For the homozygous rare variant counts: gs://gnomad-sarah/compound_hets/het_hom_per_gene.tsv

We also introduced the "priority counting" discussed in the last meeting so that on the hover over box, we can also display values for unphased and in cis (whereby in trans + unphased + in cis always equal to the count for two het variants): two het variants: ('n_any_het_het') in trans ('n_chet') unphased ('n_unphased_without_chet') in cis ('n_same_hap_without_chet_or_unphased')

I've also attached the updated powerpoint from our last meeting.

two_rare_variants_gnomad_browser.pptx

slstenton avatar Jan 12 '23 18:01 slstenton

@slstenton I'm looking at the new data and new PPT and here are the differences I observe, please let me know if you see anything important I've overlooked:

  • The set of consequences in the un-expanded version of the table is generally different
  • The set of consequences in the expanded version are the same except that we add supporting/supporting at the point shown
  • Each cell of the table shows the two-het count, with the trans count in parens
  • Each cell of the table gets a tooltip now, showing all four counts
  • Bioarxiv links to have the versions removed so they always point to latest version
  • New caption over the existing tables
  • New homozygous table
  • Disclaimers somewhere TBD (see below)

And here are a few questions that I have:

  • I don't see an expanded version of the homozygous table. I'm thinking if the user expands the heterozygous table, the homozygous table should remain as-is. An alternative might be to have the homozygous table disappear if the heterozygous table is expanded, to free up some room for the expanded heterozygous table. Please let me know if you have a preference, or if there is supposed in fact to be an expanded homozygous table as well.
  • For these disclaimers: I'm thinking the best approach might be to join them together into a single text, and have that be the tooltip for the caption for the heterozygous table. Does that sound good to you?

phildarnowsky-broad avatar Jan 18 '23 16:01 phildarnowsky-broad

@phildarnowsky-broad it all looks correct, thank you.

There doesn't need to be an expanded homozygous table. We think it's best for the homozygous table to disappear when the heterozygous table is expanded.

For the disclaimers, it sounds good to combine them, and to add them as a tooltip. In which case, can we write:

Variant phase for two rare heterozygous variants is based on prediction (add link to preprint). In very rare cases an individual will be counted in the homozygous table and in the two heterozygous rare variant table if they carry a rare homozygous variant in addition to a rare pair of heterozygous variants in the same gene.

slstenton avatar Jan 18 '23 17:01 slstenton

@phildarnowsky-broad could we also add links to the variant co-occurrence blog post (https://gnomad.broadinstitute.org/news/2021-07-variant-co-occurrence-phasing-information-in-gnomad/) and the upcoming two rare variant feature blog post.

slstenton avatar Jan 18 '23 17:01 slstenton

@slstenton sure we can add those links as well.

One problem I've just noticed however, is that with these links in a tooltip, it's not really practical to click them or even copy/paste. What if, instead, we do like we do with the background information for the constraint table: add a "?" icon that links to all the needed text in a help modal? I've attached screenshots to show the elements I mean.

Image

Image

phildarnowsky-broad avatar Jan 18 '23 19:01 phildarnowsky-broad

@phildarnowsky-broad yes that's true! Yes, I agree.

slstenton avatar Jan 18 '23 19:01 slstenton