sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

How does Sourmash cope with genome redundancy in GenBank reference genomes

Open Amanda-Biocortex opened this issue 8 months ago • 4 comments

Hi again!

Just thinking about the RefSeq versus GenBank question for analysis of a metagenome (containment). Aware that Sourmash may struggle with similar genomes (ie strains) due to a smaller number of unique kmers. Therefore would it make more sense to use RefSeq for bacteria references because of the redundancy in GenBank?

Can the redundancy in the GenBank reference bacterial genomes reduce the sensitivity of Sourmash?

Many thanks Amanda

Amanda-Biocortex avatar Jun 04 '24 09:06 Amanda-Biocortex