sourmash
sourmash copied to clipboard
How does Sourmash cope with genome redundancy in GenBank reference genomes
Hi again!
Just thinking about the RefSeq versus GenBank question for analysis of a metagenome (containment). Aware that Sourmash may struggle with similar genomes (ie strains) due to a smaller number of unique kmers. Therefore would it make more sense to use RefSeq for bacteria references because of the redundancy in GenBank?
Can the redundancy in the GenBank reference bacterial genomes reduce the sensitivity of Sourmash?
Many thanks Amanda