Mash icon indicating copy to clipboard operation
Mash copied to clipboard

Inconsistent number of matching hashes

Open durrantmm opened this issue 2 years ago • 1 comments

I wanted to try calculating mash distances using my own code. I exported the hashes as integers for two .msh files using the mash info -d command. When I run mash dist on the two files, it says that 3/1000 minhashes match. When I do the calculation manually on the hash integers, I find that 9/1000 match. Any idea what might be going on?

durrantmm avatar Sep 21 '21 04:09 durrantmm

Oh I see, you calculate the jaccard similarity using a merge-sort approach. Couldn't you also just take the jaccard similarity of the two hash sets?

durrantmm avatar Sep 23 '21 17:09 durrantmm