galah icon indicating copy to clipboard operation
galah copied to clipboard

using BinDash-rs for MinHash

Open jianshu93 opened this issue 1 year ago • 2 comments

Hi @wwood,

I implemented BinDash 1/2 (the theoretically guaranteed, 1000 times faster than Mash, 10-100 faster than Dashing) in Rust (https://github.com/jianshu93/bindash-rs), the original paper for bindash 2 is here: https://www.biorxiv.org/content/10.1101/2024.03.13.584875v1.abstract. I did not make it modular yet but should be fine if a list of genomes are provided. Let me know if this can be an addition to galah for pre-clustering via Minhash-like ones.

Best, Jianshu

jianshu93 avatar Jan 10 '25 18:01 jianshu93

Sounds nice - does it have a conda? Might be easiest just to use the command line interface?

How does it compare to skani?

wwood avatar Jan 13 '25 02:01 wwood

Hi Ben, I think this is for fast initial clustering and skani cannot do that. It is way more faster than any ANI calculator since it is MinHash alone. Combined it with FastANI, we can have both fast and accurate genome clustering. Jianshu

jianshu93 avatar Jan 13 '25 03:01 jianshu93