xxHash
xxHash copied to clipboard
Clarify in documentation if zero can be returned
Hi, I'm currently using SHA1 for a hash in a program but was considering xxHash instead. However, my code relies on the fact (belief?) that SHA1 can not create a hash of all zero bits. I've gone through the xxHash documentation but can't find any mention of this. I'm specifically interested in the XXH64 and XXH3_64 algorithms. But it would be nice to state clearly for all the different algos whether they can ever return zero or not. Thanks for your time. Regards, Mike.
Hi @mikesreed , thank you for your opion. I actually 👍 for better clarification in the document / code.
In short, xxHash may return 0
(all bits are 0
s).
The shortest example is "BAD" seed with NULL (0 byte) input.
// example-xxh64-returns-0.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define XXH_STATIC_LINKING_ONLY
#define XXH_IMPLEMENTATION /* access definitions */
#include "xxhash.h"
int main()
{
uint64_t const seed = (uint64_t) (0ULL - XXH_PRIME64_5);
XXH64_hash_t const hash = XXH64(NULL, 0, seed);
// Converts hash value to canonical representation
XXH64_canonical_t c;
XXH64_canonicalFromHash(&c, hash);
// Display the content of canonical representation
printf("hash = 0x");
for (size_t i = 0; i < sizeof(c.digest); ++i) {
printf("%02x", c.digest[i]);
}
printf("\n");
return EXIT_SUCCESS;
}
@easyaspi314 's nice sketch will help you to understand this code.
The 0
value has as many chances to appear as any other value, which is 1 / 2^64
for both XXH64()
and XXH3_64bits
(). That's extremely tiny, as in many many millions times smaller than your chance to win the powerball lottery. Still, if you were to "try your luck" many many times (as in many billion times), it's no longer insignificant.
The situation is the same for SHA1
, though in this case, the probability is reduced by the larger length of SHA1
, to 1 / 2^160
, which is virtually the same as nothing.
This is a property shared with XXH3_128bits()
, which may also output 0
but with a probability of 1 / 2^128
, which is also good enough to be considered similar to nothing.
The main difference between SHA1
and XXH3_128bits()
(beyond the very large speed difference) is that SHA1
used to be "cryptographic", which means that manufacturing on purpose a specific return value (0
in this case) is extremely hard (but no longer impossible, that's why SHA1
lost its "cryptographic" label some years ago). XXH3_128bits()
is not cryptographic, so it's easier to generate a value 0
on purpose.
That's the difference between "accidental" and "intentional" collisions.
If you use case only aims at protecting against "accidental" collision with the value 0
, then XXH3_128bits()
should be good enough for your use case (and much faster).
However, if producing a value 0
intentionally could lead to a security risk and the input could be manipulated by an external attacker, you should use a cryptographic algorithm, but a stronger one than SHA1
. Something like SHA256
or BLAKE3
.