C. Titus Brown

Results 979 comments of C. Titus Brown
trafficstars

Trying this out with the CLI plugin infrastructure https://github.com/sourmash-bio/sourmash/pull/2438 - see PR https://github.com/ctb/2022-sourmash-filter-min-samples/pull/1. Kinda neat - when all the machinery works, you get the ability to run: ``` % sourmash...

The code has now been moved from https://github.com/ctb/2022-sourmash-filter-min-samples to https://github.com/ctb/sourmash_plugin_commonhash. Leaving this issue open because it has a lot of good discussion that we should put in advanced documentation or...

> First, do you think sourmash be a suitable and fast option to determine this? Yes, I think so. Using sourmash you could find genomes that were 99.9% identical (or...

oh, yes! then k=51, and/or lower scaled values (scaled=100, for example), would ensure perfect identity. If only exact matches are needed, you can compare the md5sum of the signatures directly...

hi @krastegar, the final part of the gather algorithm itself is not directly parallelizable, or at least not easily so. But there are things you can do. Read on... As...

https://github.com/sourmash-bio/pyo3_branchwater is a plugin with a fast (multithreaded) implementation of multigather.

as of [sourmash_plugin_branchwater v0.9.5](https://github.com/sourmash-bio/sourmash_plugin_branchwater/releases/tag/v0.9.5), `sourmash scripts fastmultigather` is a feature-complete multithreaded multi-query gather, and `sourmash scripts fastgather` is a feature-complete multithreaded single-query gather 🎉 I'll close this once I update...

> Could you expand on what you mean by this statement with regards to the f_unique_to_query column?: > 'This column should be used in any analysis that needs to avoid...

> Such a clear explanation of `f_unique_query` ! thank you! > > Is the abundance metric calculated such that matches aren't doubled counted (ie abundance relates to `f_unique_to_query` as opposed...