dolma
dolma copied to clipboard
Change bloom_filter implementation of hash
Currently, bloom_filter.rs implements ahash for the internal hasher.
This is problematic since ahash has an unstable representation:
different computers or computers on different versions of the code will observe different hash values. As such, aHash is not recommended for use other than in-memory maps. Specifically, aHash is not intended for network use or in applications which persist hashed values.
I would love to learn if the dolma developers have found a way to serialize it in a way that maintains some kind of portability, but that is not a supported use case and I feel there is benefit in moving to a stable hash.
Recommendations
- Rust's Default hash is ensured to be reasonably fast and cryptographically secure. Currently it is siphash1-3 and it supports keyed hashing (which can be used as a seeded hash)
- Blake3 is one of the fastest if not the fastest cryptographic hash. It also supports keyed hashing
- xxhash (more specifically xxh3 iteration) is one of the fastest if not the fastest hasher that passes SMHasher.