sourmash
sourmash copied to clipboard
How to expose the B-Tree MinHash impl to Python?
#1045 defaults compute to use the B-Tree impl. Also add a flag in the CLI to choose the Vec one? The Vec one is better in very limited cases (very small datasets), so I think we don't need the CLI flag.
On the Python API, there are some places where we change the MinHash that (maybe?) could also benefit from the B-Tree impl. Check gather
(which modifies the query), for example.
(punted from https://github.com/dib-lab/sourmash/pull/1045#issuecomment-649719301)
Started digging into this a little bit, just for fun -
-
KmerMinHashBTree
doesn't have a few necessary methods, in particularas_hll
,update
, andremove_from
; -
SourmashNodegraph.matches(mh)
expects aKmerMinHash
not aKmerMinHashBTree
; - the revindex implementation doesn't like
KmerMinHashBTree
either;