Zero-Allocation-Hashing
Zero-Allocation-Hashing copied to clipboard
Document a way for other languages to get the same output for a given input string
The xxHash implementation doesn't produce predictable results when compared to python for strings (as in the given example.
I believe this is due to java's handling of character arrays: https://codeahoy.com/2016/05/08/the-char-type-in-java-is-broken/
We get predictable results when we dump to a byte array instead:
LongHashFunction.xx().hashBytes("test".getBytes())
gets the same output as
xxhash.xxh64('test').intdigest()
which is the result that we'd expect.
We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")
If there's a way for other languages to get the same output for a given input string, it'd be nice to have it documented.
We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")
You probably need to encode a string in UTF-16, because that's what essentially the Java's character arrays are. Also would need byte order, though.
anyone been able to provide this? we would be interested in this as well.
We had the same issue, except with nodejs.
In the end what produced the same hash was:
LongHashFunction.xx(123).hashBytes("teststring".getBytes("UTF-8"))
and in nodejs:
const hash = XXHash.hash64(Buffer.from('teststring', 'utf-8'), 123);
const hashValue = hash.readBigUInt64LE();
I hope this helps someone in the future.
This should not be an issue.
the default encoding in different languages are not same. to get the same hash result, the binary layout of the input must be exactly same, so select a well defined encoding codec before hash.