sourmash
sourmash copied to clipboard
Quickly search, compare, and analyze genomic and metagenomic data sets.
From https://github.com/sourmash-bio/sourmash/pull/2178, @bluegenes: > More complicated use case that would be _really_ neat to enable: run prefetch against, e.g. genus-level representative database. Then run gather and use the prefetch output...
I don't think we have explicitly written rules for identifiers for folks building custom databases? What characters can and can't be included? For signature names, we consider everything before the...
when we use `sourmash tax annotate` on gather results, we produce a column with semicolon-separated lineages in it. we don't have many (any?) sourmash subcommands that natively ingest that format,...
right now, the `fromfile` format doesn't support a simple way to produce translated sequence - presumably we'd need to add a CDS column or something, or else build workflows (elsewhere)...
Adds a `FrozenSourmashSignature` class, and provides sensible `to_mutable()` and `to_frozen()` methods on `SourmashSignature` and `FrozenSourmashSignature`. Provides an `update()` context manager that wraps changes so that a `FrozenSourmashSignature` is left at...
`python -m sourmash.sig` works `python -m sourmash.tax` doesn't work `python -m sourmash.lca` doesn't work, for different reasons do we want this to work? if so, we should fix and test....
#1045 defaults compute to use the B-Tree impl. Also add a flag in the CLI to choose the Vec one? The Vec one is better in very limited cases (very...
After https://github.com/sourmash-bio/sourmash/pull/1610, it seems like an obvious next set of simplifications is to remove all of the `with x.update(): ...` code blocks and replace them with `flatten` and/or `downsample` calls....
see https://octo-repo-visualization.vercel.app/?repo=sourmash-bio%2Fsourmash https://next.github.com/projects/repo-visualization?utm_source=programmingdigest&utm_medium=email&utm_campaign=432 explains how to add this to github actions.
I'd like to compute and index MinHash sketches on GTDB r202 representive genomes. The sketching step (v4.2.1) is parallelized with 16 or 40 threads on a 160-cores machine. But some...