xxhash icon indicating copy to clipboard operation
xxhash copied to clipboard

Library creates conflicts for very short strings

Open Trinidus opened this issue 8 months ago • 2 comments

I just ran into an issue with some example data where I got like 64 conflicting entries in a sample of about 810.000. This seems to be way too much for what I expected so my questions:

  • Is there a minimum string size for this hash function?
  • Is there a general problem with the algorithm for very short strings?

Here one of my examples with conflicting identical hashes for different strings:

const hash1 = xxHash32('10-10020567-549308', 0).toString(16) // --> b25eede2
const hash2 = xxHash32('20-20384387-548563', 0).toString(16) // --> b25eede2

Trinidus avatar Apr 08 '25 13:04 Trinidus

@Trinidus,

That is to be expected given that the hash is only 32 bits.

Here are a few discussions:

  • https://github.com/Cyan4973/xxHash/issues/165
  • https://xxhash.com/
  • https://stackoverflow.com/questions/45788721/where-can-i-find-xxhash64-and-md5-collision-probability-statistics

Given that xxHash32 is only 32bit, the probability becomes something like:

p(k) = 1 - exp(-k(k-1) / (2 * 2^32))

Image

From the graph, anything over 100k values will have more than a 50% chance of a collision.

Jason3S avatar Apr 09 '25 15:04 Jason3S