sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

explore extending branchwater plugin to load zip files for prefetch and gather, entirely on rust side?

Open ctb opened this issue 1 year ago • 1 comments

this could be a good & substantive next step towards higher-level oxidation of sourmash internals - what if we added a load_from plugin entry point to branchwater that loaded zip files, and supported the Index API / passed the index API tests?

this could potentially then support multithreaded prefetch, and hence faster gather.

it would take a while to work, but we could do it incrementally, I suspect...

related issues:

  • https://github.com/sourmash-bio/sourmash/issues/1939

ctb avatar Jun 29 '24 19:06 ctb

I think an additional/different reason to implement something like this is we could also easily support the RocksDB index here as well.

ctb avatar Jul 02 '24 13:07 ctb

I looked into this a bunch, and it is essentially impossible (or at least complicated, and very ill advised) to share objects between pyo3 and FFI. So we would not be able to do this.

A better approach is to upgrade sourmash to natively support multithreading (and RocksDB), I think.

Full RocksDB support was added in https://github.com/sourmash-bio/sourmash/pull/3545 🎉

ctb avatar Jun 14 '25 14:06 ctb

ref https://github.com/sourmash-bio/sourmash/issues/3595 as a better idea for zip.

ctb avatar Jun 14 '25 14:06 ctb