sourmash
sourmash copied to clipboard
returning more & richer information from rust `SeqToHash` struct/iterator
Over in oxli-bio/oxli, @adamtaranto and I have been implementing k-mer functionality (no sketching!) on top of SeqToHashes
in Rust, and then exposing it to Python via pyo3.
Most recently, we just added a kmers_and_hashes
method that returns canonical k-mers and their hashes see https://github.com/oxli-bio/oxli/pull/40. We're talking there and in https://github.com/oxli-bio/oxli/issues/66 about how to pull relevant functionality back into sourmash, too.
We are not yet using protein k-mers, but we hope to get there soon: https://github.com/oxli-bio/oxli/issues/38
Some fun related issues popped up in this repo -
- @Adamtaranto has been exploring this for longer than I realized 😆 - https://github.com/sourmash-bio/sourmash/issues/2455
- https://github.com/sourmash-bio/sourmash/issues/2985 suggests supporting an iterator that yields 4-tuples:
(hashval, sequence, strand, position)
- also see https://github.com/sourmash-bio/sourmash/issues/2073 where this functionality is requested as part of the sketches
- we could slay https://github.com/sourmash-bio/sourmash/pull/2856 as part of this