sourmashconsumr icon indicating copy to clipboard operation
sourmashconsumr copied to clipboard

experimental LIN taxonomy integration

Open bluegenes opened this issue 1 year ago • 0 comments

[note: this is an experimental/draft PR and should not be merged as-is]

ref #72

In sourmash taxonomy, we're adding utils to use the LIN taxonomic framework, which allows for greater flexibility and specificity compared with standard taxonomic ranks. For example, if only certain strains of a microbe are pathogenic, the LIN framework may be useful for identifying/grouping pathogenic vs non-pathogenic strains. Is this something you're interested in allowing for viz?

We had a question about whether sourmashconsumr would work with LIN lineages for e.g. sankey plots, so I decided to experiment a little to see how easy/hard it would be to allow LIN functionality.

This PR has lins semi working for:

  • tax_glom_taxonomy_annotate
  • plot_taxonomy_annotate_sankey

plot using sourmash test data from the lins PR (tests/test-data/tax/test1.gather.csv annotated with tests/test-data/tax/test.LIN-taxonomy.csv): image

Challenges and thoughts

  • LIN positions are not always a set length, though I believe 20 positions is currently the LINbase standard. Selecting a default to summarize if the user doesn't provide one is currently hacky and would need some thought.
  • LIN position "names" (numbers) aren't terribly helpful for visualizations.
    • "LINgroups" are defined LIN prefixes that have some useful meaning (e.g. we may have named groups for the 0;0;1 prefix vs the0;1;1 prefix). These groups are given names, which would be a bit nicer for plotting. However, using LINgroup names would currently require reading in a separate lingroup file or using the lingroup report from tax metagenome
    • Thinking about it more, the sankey doesn't actually make sense as it is. Once a lineage diverges, it should never come back together! Might need to work with the full prefix (0;1;1 at a rank/position rather than the individual value).

I am happy to work on this further or drop it, if this isn't something you want to allow!

bluegenes avatar Mar 15 '23 18:03 bluegenes