sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

document max containment somewhere.

Open ctb opened this issue 2 years ago • 2 comments

per @ccbaumler https://github.com/sourmash-bio/sourmash/pull/2222#discussion_r949418693 we don't actually document max containment anywhere 😱

I can't think of a particularly good place to put it, either. We may need a new section; could be part of https://github.com/sourmash-bio/sourmash/pull/2184

and/or it may be time to add @bluegenes beautiful pictures into the sourmash documentation somewhere 🤔

ctb avatar Aug 19 '22 13:08 ctb

thinking -

  • doesn't quite belong in command-line docs;
  • it could fit in classifying signatures but I think that would be focused on why it's good for metagenomes, and some ANI stuff; we probably also need a theory section;
  • it could go into the new sourmash-internals document in https://github.com/sourmash-bio/sourmash/pull/2184, but the best fit there is in the ANI section IMO;
  • maybe we just need a new theory/math document? @ccbaumler keeps on dropping equations into PRs and issues, maybe it's time to add those to docs 😁

ctb avatar Aug 19 '22 13:08 ctb

more generally, we should describe the math for all of the various similarity calculations used:

  • jaccard similarity or jaccard index
  • jaccard containment or containment index
  • average containment
  • max containment
  • angular similarity

as well as point out which ones are distance metrics (jaccard similarity, angular similarity, and max containment; not sure about average containment).

ctb avatar Aug 20 '22 14:08 ctb

Hello Titus @ctb, I'm interested in this documentation! Especially in the difference between the containment metrics. Cheers!

jorondo1 avatar May 11 '23 08:05 jorondo1