dolma icon indicating copy to clipboard operation
dolma copied to clipboard

Change bloom_filter implementation of hash

Open chris-ha458 opened this issue 10 months ago • 11 comments

Currently, bloom_filter.rs implements ahash for the internal hasher.

This is problematic since ahash has an unstable representation:

different computers or computers on different versions of the code will observe different hash values. As such, aHash is not recommended for use other than in-memory maps. Specifically, aHash is not intended for network use or in applications which persist hashed values.

I would love to learn if the dolma developers have found a way to serialize it in a way that maintains some kind of portability, but that is not a supported use case and I feel there is benefit in moving to a stable hash.

Recommendations

  • Rust's Default hash is ensured to be reasonably fast and cryptographically secure. Currently it is siphash1-3 and it supports keyed hashing (which can be used as a seeded hash)
  • Blake3 is one of the fastest if not the fastest cryptographic hash. It also supports keyed hashing
  • xxhash (more specifically xxh3 iteration) is one of the fastest if not the fastest hasher that passes SMHasher.

chris-ha458 avatar Aug 20 '23 13:08 chris-ha458