sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

returning more & richer information from rust `SeqToHash` struct/iterator

Open ctb opened this issue 4 months ago • 0 comments

Over in oxli-bio/oxli, @adamtaranto and I have been implementing k-mer functionality (no sketching!) on top of SeqToHashes in Rust, and then exposing it to Python via pyo3.

Most recently, we just added a kmers_and_hashes method that returns canonical k-mers and their hashes see https://github.com/oxli-bio/oxli/pull/40. We're talking there and in https://github.com/oxli-bio/oxli/issues/66 about how to pull relevant functionality back into sourmash, too.

We are not yet using protein k-mers, but we hope to get there soon: https://github.com/oxli-bio/oxli/issues/38

Some fun related issues popped up in this repo -

  • @Adamtaranto has been exploring this for longer than I realized 😆 - https://github.com/sourmash-bio/sourmash/issues/2455
  • https://github.com/sourmash-bio/sourmash/issues/2985 suggests supporting an iterator that yields 4-tuples: (hashval, sequence, strand, position)
  • also see https://github.com/sourmash-bio/sourmash/issues/2073 where this functionality is requested as part of the sketches
  • we could slay https://github.com/sourmash-bio/sourmash/pull/2856 as part of this

ctb avatar Sep 26 '24 14:09 ctb