cov-spectrum-website icon indicating copy to clipboard operation
cov-spectrum-website copied to clipboard

ENH: Beefed up lineage display, similar to mutation table with jaccard

Open corneliusroemer opened this issue 1 year ago • 0 comments

When I investigate a mutation, say S:185D, I like that it shows me which lineages are contained in the query: image

This is useful, it gives me ideas for lineages to query and understand whether the mutation has popped up homoplasically or not.

Now I feel like this lineage analysis could be extended to make it even more useful.

One question I often ask myself is: "Which lineage has a high share of this mutation" - this is not easy to derive from the current lineage display. It shouldn't be too hard to calculate this, all one needs to do is query the total number of sequences in this lineage in this time period, then divide the numbers of that lineage in that query by the total number, display the results. This is a bit similar but also different from Jaccard.

An extension of the idea would be to answer the question: which lineage is defined by this mutation? To make it useful one would want to look at wildcards, if BQ.1 is defined by S:460K, then I wouldn't want to display all the BQ.1.1.22 but instead see BQ.1*. but that's not as relevant as the point above.

I think this could be related to GenSpectrum/dashboards#65 - basically we have nice views and tools for mutations that could be leveraged for lineages in a similar way.

corneliusroemer avatar Nov 29 '22 16:11 corneliusroemer