Zero-Allocation-Hashing Document a way for other languages to get the same output for a given input string

Document a way for other languages to get the same output for a given input string

Open sean-abbott opened this issue 5 years ago • 4 comments

The xxHash implementation doesn't produce predictable results when compared to python for strings (as in the given example.

I believe this is due to java's handling of character arrays: https://codeahoy.com/2016/05/08/the-char-type-in-java-is-broken/

We get predictable results when we dump to a byte array instead: LongHashFunction.xx().hashBytes("test".getBytes()) gets the same output as xxhash.xxh64('test').intdigest() which is the result that we'd expect.

We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")

If there's a way for other languages to get the same output for a given input string, it'd be nice to have it documented.

Jun 21 '19 14:06 sean-abbott

We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")

You probably need to encode a string in UTF-16, because that's what essentially the Java's character arrays are. Also would need byte order, though.

Jul 19 '19 16:07 leventov

anyone been able to provide this? we would be interested in this as well.

Dec 02 '20 15:12 drummerwolli

We had the same issue, except with nodejs.

In the end what produced the same hash was:

LongHashFunction.xx(123).hashBytes("teststring".getBytes("UTF-8"))

and in nodejs:

const hash = XXHash.hash64(Buffer.from('teststring', 'utf-8'), 123);
const hashValue = hash.readBigUInt64LE();

I hope this helps someone in the future.

May 26 '21 10:05 anzecesar

This should not be an issue.

the default encoding in different languages are not same. to get the same hash result, the binary layout of the input must be exactly same, so select a well defined encoding codec before hash.

May 26 '21 13:05 gzm55

Zero-Allocation-Hashing Zero-Allocation-Hashing copied to clipboard

Document a way for other languages to get the same output for a given input string

Zero-Allocation-Hashing
Zero-Allocation-Hashing copied to clipboard