sourmash
sourmash copied to clipboard
document max containment somewhere.
per @ccbaumler https://github.com/sourmash-bio/sourmash/pull/2222#discussion_r949418693 we don't actually document max containment anywhere 😱
I can't think of a particularly good place to put it, either. We may need a new section; could be part of https://github.com/sourmash-bio/sourmash/pull/2184
and/or it may be time to add @bluegenes beautiful pictures into the sourmash documentation somewhere 🤔
thinking -
- doesn't quite belong in command-line docs;
- it could fit in classifying signatures but I think that would be focused on why it's good for metagenomes, and some ANI stuff; we probably also need a theory section;
- it could go into the new sourmash-internals document in https://github.com/sourmash-bio/sourmash/pull/2184, but the best fit there is in the ANI section IMO;
- maybe we just need a new theory/math document? @ccbaumler keeps on dropping equations into PRs and issues, maybe it's time to add those to docs 😁
more generally, we should describe the math for all of the various similarity calculations used:
- jaccard similarity or jaccard index
- jaccard containment or containment index
- average containment
- max containment
- angular similarity
as well as point out which ones are distance metrics (jaccard similarity, angular similarity, and max containment; not sure about average containment).
Hello Titus @ctb, I'm interested in this documentation! Especially in the difference between the containment metrics. Cheers!