Data.HashFunction icon indicating copy to clipboard operation
Data.HashFunction copied to clipboard

xxHash gives incorrect value when using AsHexString method.

Open AdamLeMmon01 opened this issue 4 years ago • 4 comments

I had to create a custom as hex string method to get the correct value.

Either the value is being stored under the hood with the bytes reversed, or just when using the AsHexString method the value is being reversed. Looking at the source, it's probably the storage format since AsHexString didn't appear to be reversing anything.

I had to generate my own helper method to get the hex string correctly. This was required for me as I'm trying to match up xxhashes from different languages and the c++ version I'm using (and several others I checked) did not have this issue.

//My own helper method I had to use to work around the ordering issue. public static string AsHexString(byte[] hash, bool uppercase) { Array.Reverse(hash);//Note this reverse. This is currently required to work correctly StringBuilder stringBuilder = new StringBuilder(hash.Length); string format = uppercase ? "X2" : "x2"; foreach (byte num in hash) stringBuilder.Append(num.ToString(format)); return stringBuilder.ToString(); }

AdamLeMmon01 avatar Mar 19 '21 16:03 AdamLeMmon01

Test values used: txt file named hash.txt with contents: test

xxHash value received: 398167db5dcadc4f xxhash value expected: 4fdcca5ddb678139 note that if I use my helper to reverse the bytes, it then provides the correct output (the output hash strings just have the bytes in reverse order the first one ends with 4f, and the second one starts with 4f, etc.)

AdamLeMmon01 avatar Mar 19 '21 16:03 AdamLeMmon01

I just noticed a similar thing with the jenkins one at a time. When I use https://www.pelock.com/products/hash-calculator the value for the hash of "a" is CA2E9442. When I use System.Data.HashFunction, I get 42942eca.

If I use this method I get the same results as pelock.com:

uint jenkins_one_at_a_time_hash(string key, int? length = null)
{
	uint hash, i;
	for (hash = i = 0; i < key.Length; ++i)
	{
		hash += key[(int)i];
		hash += (hash << 10);
		hash ^= (hash >> 6);
	}
	hash += (hash << 3);
	hash ^= (hash >> 11);
	hash += (hash << 15);
	return hash;
}

danwize avatar Jun 28 '21 17:06 danwize

Any update on this? We have encountered this same problem as well. As long as you keep using this method the wrong way and compare hashes with this incorrect result, you'll be fine. But we have to mix our .NET functionality and the hashes generated by this package with the AsHexString method with PHP functionality now which will also generate xxHash hex string values, and noticed that they did not match. The PHP library gives the correct result, this one does not. I believe it can be traced back to the usage of BitConverter.GetBytes though, which (depending on your machine'es endianness) may reverse bytes around.

mifieli avatar Dec 29 '21 15:12 mifieli

This is expected behavior due to the ambiguity of what it means to convert a given hash value to a hex string. For hash values that output 8, 16, 32, or 64 bits, they usually are output as a native byte, ushort, uint, or ulong.

When converting a native integer value to a hex value, the value is generally output as a big-endian string; however, when the result is truly a byte array (regardless of size) and we convert to a hex string, the value is output in byte-order (which technically is little-endian).

The exact behavior among different hash functions when executed on a big endian machine versus little endian isn't necessarily defined; however, a vast majority of them will return the same logical native value when they return a native value, and will return the same byte array if they return a byte array.

All of that aside, I unfortunately am no longer able to maintain this library due to personal responsibilities and will be archiving it soon. I'm happy to transfer ownership to a new maintainer if someone else is interested in taking ownership of this library.

brandondahler avatar Jan 08 '22 18:01 brandondahler