C. Titus Brown
C. Titus Brown
Trying this out with the CLI plugin infrastructure https://github.com/sourmash-bio/sourmash/pull/2438 - see PR https://github.com/ctb/2022-sourmash-filter-min-samples/pull/1. Kinda neat - when all the machinery works, you get the ability to run: ``` % sourmash...
The code has now been moved from https://github.com/ctb/2022-sourmash-filter-min-samples to https://github.com/ctb/sourmash_plugin_commonhash. Leaving this issue open because it has a lot of good discussion that we should put in advanced documentation or...
> First, do you think sourmash be a suitable and fast option to determine this? Yes, I think so. Using sourmash you could find genomes that were 99.9% identical (or...
oh, yes! then k=51, and/or lower scaled values (scaled=100, for example), would ensure perfect identity. If only exact matches are needed, you can compare the md5sum of the signatures directly...
hi @krastegar, the final part of the gather algorithm itself is not directly parallelizable, or at least not easily so. But there are things you can do. Read on... As...
excellent, glad to hear it!
https://github.com/sourmash-bio/pyo3_branchwater is a plugin with a fast (multithreaded) implementation of multigather.
as of [sourmash_plugin_branchwater v0.9.5](https://github.com/sourmash-bio/sourmash_plugin_branchwater/releases/tag/v0.9.5), `sourmash scripts fastmultigather` is a feature-complete multithreaded multi-query gather, and `sourmash scripts fastgather` is a feature-complete multithreaded single-query gather 🎉 I'll close this once I update...
> Could you expand on what you mean by this statement with regards to the f_unique_to_query column?: > 'This column should be used in any analysis that needs to avoid...
> Such a clear explanation of `f_unique_query` ! thank you! > > Is the abundance metric calculated such that matches aren't doubled counted (ie abundance relates to `f_unique_to_query` as opposed...