sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

How to expose the B-Tree MinHash impl to Python?

Open luizirber opened this issue 4 years ago • 1 comments

#1045 defaults compute to use the B-Tree impl. Also add a flag in the CLI to choose the Vec one? The Vec one is better in very limited cases (very small datasets), so I think we don't need the CLI flag.

On the Python API, there are some places where we change the MinHash that (maybe?) could also benefit from the B-Tree impl. Check gather (which modifies the query), for example.

(punted from https://github.com/dib-lab/sourmash/pull/1045#issuecomment-649719301)

luizirber avatar Jun 27 '20 21:06 luizirber

Started digging into this a little bit, just for fun -

  • KmerMinHashBTree doesn't have a few necessary methods, in particular as_hll, update, and remove_from;
  • SourmashNodegraph.matches(mh) expects a KmerMinHash not a KmerMinHashBTree;
  • the revindex implementation doesn't like KmerMinHashBTree either;

ctb avatar Aug 04 '22 10:08 ctb