sourmash
sourmash copied to clipboard
searching database for any duplicates genomes
Hi there
I have received genomes from numerous sources, some previously public, some not but I don't know which So I have a set of genomes and want to see if any of them are identical to the larger complete public set of genomes
First, do you think sourmash be a suitable and fast option to determine this? Second, would there be an appropriately stringent kmer and scale options for building the signature databases?
Thanks a lot