orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Optimize hashing

Open pentp opened this issue 3 years ago • 5 comments

Remove all allocations from deterministic hashing and use more pre-computed hash values.

Microsoft Reviewers: Open in CodeFlow

pentp avatar Jul 18 '22 19:07 pentp

I'd like to wait until our distributed testing infrastructure is back online before merging. In the meantime, I've left some comments. Thanks, as always

ReubenBond avatar Jul 21 '22 23:07 ReubenBond

I did some benchmarking with various data lengths (in bytes). JenkinsV2 is the version in this PR. Overall it looks like XxHash32 would be the best option (and it has better hashing quality).


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1862 (21H2)
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.6.22352.1
  [Host]     : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT


Method Length Mean Error StdDev Ratio
Jenkins 3 23.695 ns 0.2038 ns 0.1906 ns 1.00
JenkinsV2 3 4.674 ns 0.0292 ns 0.0244 ns 0.20
XxHash32 3 6.804 ns 0.0140 ns 0.0131 ns 0.29
XxHash64 3 8.422 ns 0.0302 ns 0.0252 ns 0.36
Sha256 3 439.652 ns 2.0611 ns 1.7211 ns 18.57
Jenkins 24 66.817 ns 0.0416 ns 0.0389 ns 1.00
JenkinsV2 24 13.317 ns 0.0043 ns 0.0036 ns 0.20
XxHash32 24 10.965 ns 0.0134 ns 0.0112 ns 0.16
XxHash64 24 10.157 ns 0.0070 ns 0.0066 ns 0.15
Sha256 24 437.698 ns 3.3567 ns 2.9757 ns 6.55
Jenkins 100 203.097 ns 0.1447 ns 0.1283 ns 1.00
JenkinsV2 100 42.779 ns 0.1976 ns 0.1848 ns 0.21
XxHash32 100 24.373 ns 0.0102 ns 0.0091 ns 0.12
XxHash64 100 20.962 ns 0.0245 ns 0.0229 ns 0.10
Sha256 100 695.510 ns 4.3684 ns 4.0862 ns 3.42
Jenkins 1024 1,941.947 ns 1.2229 ns 1.1439 ns 1.00
JenkinsV2 1024 414.290 ns 0.3464 ns 0.2705 ns 0.21
XxHash32 1024 196.953 ns 0.1136 ns 0.0949 ns 0.10
XxHash64 1024 123.778 ns 0.1209 ns 0.1131 ns 0.06
Sha256 1024 4,192.216 ns 13.2682 ns 12.4111 ns 2.16

pentp avatar Jul 22 '22 20:07 pentp

I agree, the hashing quality is a significant plus. I was looking for a reference I had from last time I looked into this. I think this is it: https://github.com/rurban/smhasher xxHash64 has no known quality issues (xxHash32 does) and Jenkins is known-bad.

ReubenBond avatar Jul 22 '22 20:07 ReubenBond

xxHash64 is a bit faster for inputs as small as 24, but I'm not sure if the quality stays the same if we truncate the result from 64-bits to 32-bits?

pentp avatar Jul 22 '22 20:07 pentp

I think it's fine to use a mix of xxHash32 and xxHash64 depending on the case. We can use xxHash32 for the 32bit cases (GetConsistentHashCode, etc)

ReubenBond avatar Jul 22 '22 20:07 ReubenBond