Mash
Mash copied to clipboard
Inconsistent number of matching hashes
I wanted to try calculating mash distances using my own code. I exported the hashes as integers for two .msh
files using the mash info -d
command. When I run mash dist
on the two files, it says that 3/1000 minhashes match. When I do the calculation manually on the hash integers, I find that 9/1000 match. Any idea what might be going on?
Oh I see, you calculate the jaccard similarity using a merge-sort approach. Couldn't you also just take the jaccard similarity of the two hash sets?