kmer-db icon indicating copy to clipboard operation
kmer-db copied to clipboard

sketch size?

Open jianshu93 opened this issue 1 year ago • 1 comments

Hi kmer-db team,

I was not able to find the minhash sketch size but only filter fraction, what does this mean with respect to sketch size? It was very clear in all other minhash implementations such as Mash, BinDash et.al. and the sketch size is the key parameter determining accuracy and speed.

Thanks,

Jianshu

jianshu93 avatar Aug 28 '24 14:08 jianshu93

Hello,

Instead of using fixed-sized sketches, Kmer-db selects given fraction of k-mers as this allows more accurate distance estimation for genomes with different sizes. But if you now more or less the size of your genomes, the math is easy: sketch_size = genome_size * fraction. In the paper you can find the error comparison of Kmer-db fractions and Mash sketches on bacterial genomes.

m_bioinformatics_35_1_133_f3

The thing is that Kmer-db is significantly faster than Mash and in many cases you won't need sketching at all (this is the default mode, actually).

Hope it helps. Adam

agudys avatar Sep 18 '24 08:09 agudys