sourmash
sourmash copied to clipboard
[MRG] Add `FrozenSourmashSignature`
Adds a FrozenSourmashSignature
class, and provides sensible to_mutable()
and to_frozen()
methods on SourmashSignature
and FrozenSourmashSignature
. Provides an update()
context manager that wraps changes so that a FrozenSourmashSignature
is left at the end.
Complement to #1508, which did this with MinHash
.
As with #1508, very few changes are required to make all the tests pass. The ease of doing this suggests to me that this is a Good Change.
Also as with #1508, I really like this updated code; it's so nice and clear!
I am hoping that these changes make it easier for @luizirber in Rust/Python interaction.
On the other hand, this could break some 3rd party code (as #1508 did :). Admittedly we are the owners of most of that code, but 🤷 not sure if we should delay this 'til 5.0. On the other hand, could fit nicely with a 4.2.0 release.
TODO:
- [x] update places that unnecessarily call
flatten
anddownsample
(b/c signatures are already flat and/or downsampled) - [x] check that
SourmashSignature
by default owns a frozenMinHash
(I think it does, but confirm) - [x] check that
SourmashSignature.to_frozen()
also freezes theMinHash
object. - [x] maybe make a new method that does the
ss = ss.to_mutable(); ss.minhash = xyz; ss = ss.to_frozen()
dance; - [x] OR, use a context manager?
with sig.mutate() as sig:
? - [x] revisit
into_frozen()
andinto_mutable()
- [x] add docstrings :)
- [x] consider making error messages clearer & providing guidance around using
to_mutable()
Codecov Report
Merging #1610 (bee45ad) into latest (b1b4ef7) will increase coverage by
7.40%
. The diff coverage is91.34%
.
@@ Coverage Diff @@
## latest #1610 +/- ##
==========================================
+ Coverage 84.70% 92.10% +7.40%
==========================================
Files 131 100 -31
Lines 15544 11343 -4201
Branches 2219 2236 +17
==========================================
- Hits 13166 10448 -2718
+ Misses 2085 601 -1484
- Partials 293 294 +1
Flag | Coverage Δ | |
---|---|---|
python | 92.10% <91.34%> (+0.04%) |
:arrow_up: |
rust | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/sourmash/signature.py | 93.86% <84.09%> (+2.05%) |
:arrow_up: |
src/sourmash/minhash.py | 93.76% <90.90%> (-0.27%) |
:arrow_down: |
src/sourmash/commands.py | 90.78% <96.87%> (+0.06%) |
:arrow_up: |
src/sourmash/index/__init__.py | 96.71% <100.00%> (ø) |
|
src/sourmash/lca/command_classify.py | 81.48% <100.00%> (+0.71%) |
:arrow_up: |
src/sourmash/lca/lca_db.py | 93.29% <100.00%> (+0.01%) |
:arrow_up: |
src/sourmash/search.py | 97.94% <100.00%> (+<0.01%) |
:arrow_up: |
src/sourmash/sig/__main__.py | 94.02% <100.00%> (+0.02%) |
:arrow_up: |
src/core/src/cmd.rs | ||
... and 31 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@luizirber opinions?
Ready for review & merge @sourmash-bio/devs !
hi @luizirber I think this PR is best reviewed by you, if you have time. It's a quick read, I promise ;). And you might have a suggestion for how to improve, too - I'm not thrilled with the context manager syntax I use.
I have a bunch of commits in #2230 that flip the MinHash to default to large ones (b-tree based) and expose the Frozen ones (vec-based), should I add the commits here or make a new PR merging into this one?
(I can also just merge them together in #2230, since that will be a breaking change version bump on the Rust side and I would like to avoid doing two in a row :grimacing: )
I have a bunch of commits in #2230 that flip the MinHash to default to large ones (b-tree based) and expose the Frozen ones (vec-based), should I add the commits here or make a new PR merging into this one?
(I can also just merge them together in #2230, since that will be a breaking change version bump on the Rust side and I would like to avoid doing two in a row 😬 )
your call - if last option is easiest, go for it!
I tested out with the pile of changes coming in #2230 and it works. In the future might want to avoid all the copying, but as you pointed in the comment for the context manager it is hard to track bugs otherwise...
Another future optimization is being more strict on the Mutable/Frozen on the Rust side, right now they map to
- Mutable:
KmerMinHashBTree
- Frozen:
KmerMinHash
but they are both mutable (it's just that the Frozen one is much slower to mutate with large sketches).
awesome, thanks!! should I merge, then?
awesome, thanks!! should I merge, then?
Yup, go for it!
🎉
In the future might want to avoid all the copying, but as you pointed in the comment for the context manager it is hard to track bugs otherwise...
I would have put more effort in to this, but note that SourmashSignature
really only contains name
and filename
. The .minhash
property is a FrozenMinHash
that is copied around by reference. So all the copying doesn't really matter for efficiency, or at least that's what I decided after reading through the code a bunch of times :)