clhash
clhash copied to clipboard
Add a streaming interface
I just did a short test, intel skylake i7, gcc 5.4. Only modified your example.c a bit like this:
#include <assert.h>
#include "clhash.h"
int main() {
void * random = get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
int i;
for (i = 0; i < 10000000; i++)
{
uint64_t hashvalue1 = clhash(random,"my dog0123456789", 12);
uint64_t hashvalue2 = clhash(random,"my cat0123456789", 12);
uint64_t hashvalue3 = clhash(random,"my dog0123456789", 12);
}
// assert(hashvalue1 == hashvalue3);
// assert(hashvalue1 != hashvalue2);// very likely to be true
free(random);
return 0;
}
Tested for string size 6, 7, 8, 12. For 8 byte key I get this:
make example
time ./example
real 0m0.152s
user 0m0.148s
sys 0m0.000s
For the other sizes something like
$ time ./example
real 0m0.493s
user 0m0.492s
sys 0m0.000s
So for really high performance, we are supposed to pad our data and use multiples of 8 for size?
From figure 1 in your paper I had the impression that its would work smooth fast for sizes >= 8 at least.
https://arxiv.org/abs/1503.03465
May there be a bug in recent code?
Clhash uses word-by-word processing so for tiny inputs, it will be faster if they input fits in multiples of 8 bytes. If you use sizeable inputs, however, this effect becomes negligible.
I have added a proper benchmark...
make
./benchmark
The result should look like this...
# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
8 4.50 2.50
9 6.44 2.44
10 5.80 2.60
11 5.27 2.91
12 5.00 2.67
13 4.46 2.77
14 4.14 2.71
15 3.87 2.93
16 2.38 2.88
17 3.53 2.71
18 3.33 2.89
19 3.16 2.84
20 3.00 2.80
21 2.76 2.67
22 2.64 2.91
23 2.52 2.87
24 1.50 2.83
25 2.40 2.88
26 2.23 2.85
27 2.22 2.89
28 2.07 2.93
29 2.07 2.90
30 1.93 2.93
31 1.87 2.97
32 1.12 2.94
33 1.76 2.85
34 1.76 2.88
35 1.71 2.86
36 1.61 2.94
37 1.51 2.92
38 1.53 2.95
39 1.54 2.97
40 1.05 2.90
41 1.41 2.93
42 1.43 2.95
43 1.35 2.93
44 1.27 2.91
45 1.29 2.98
46 1.26 2.91
47 1.23 2.94
48 0.88 2.96
49 1.18 2.90
50 1.20 2.92
51 1.18 2.94
52 1.15 2.96
53 1.13 2.91
54 1.07 2.93
55 1.05 2.98
56 0.75 2.93
57 1.05 2.95
58 1.03 2.97
59 1.02 2.92
60 1.00 2.97
61 0.98 2.95
62 0.94 2.97
63 0.92 2.95
64 0.72 2.94
65 0.92 2.95
66 0.91 2.91
67 0.90 2.99
68 0.88 2.97
69 0.87 2.93
70 0.86 2.97
71 0.85 2.99
72 0.64 2.97
73 0.79 2.96
74 0.78 2.95
75 0.80 2.96
76 0.76 2.97
77 0.73 2.96
78 0.74 2.97
79 0.73 2.99
80 0.57 2.95
81 0.72 2.96
82 0.73 2.98
83 0.72 2.99
84 0.71 2.98
85 0.71 2.99
86 0.70 2.98
87 0.67 2.99
88 0.55 2.95
89 0.67 2.94
90 0.69 2.98
91 0.66 2.95
92 0.65 2.96
93 0.65 2.95
94 0.64 2.98
95 0.63 2.97
96 0.50 2.96
97 0.62 2.95
98 0.63 3.00
99 0.61 2.99
100 0.60 2.98
...
4088 0.10 3.01
4089 0.10 3.01
4090 0.10 3.01
4091 0.10 3.01
4092 0.10 3.01
4093 0.10 3.01
4094 0.10 3.01
4095 0.10 3.01
If your inputs are tiny and not a multiple of 8 bytes (like 12 bytes), then there is room for optimization.
I do not expect that there is a bug. The figure you point to in https://arxiv.org/abs/1503.03465 represents the speed of hashing strings made of thousands of bytes.
Thanks. Indeed I get similar results for my intel nuc6i7kyk box with Linux 64 bit and gcc 5.4:
$ make benchmark
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -c ./src/clhash.c -Iinclude
cc -fPIC -std=c99 -O3 -msse4.2 -mpclmul -march=native -funroll-loops -Wstrict-overflow -Wstrict-aliasing -Wall -Wextra -pedantic -Wshadow -o benchmark ./benchmarks/benchmark.c -Iinclude clhash.o
$ ./benchmark
# for each input size in bytes, we report the number of cycles used to hash a byte
# First number is the size in bytes
# Second number is the number of CPU cycles per byte for clhash
# Third number is the number of CPU cycles per byte for java-like non-random hash function
8 4.50 2.50
9 6.67 2.67
10 6.00 2.60
11 5.45 2.73
12 5.00 2.83
13 4.62 2.62
14 4.14 2.71
15 3.87 1.47
16 1.38 1.75
17 2.35 1.88
18 2.22 1.89
19 2.11 1.89
20 1.90 1.90
21 1.90 1.90
22 1.73 2.00
23 1.74 2.09
24 1.00 2.00
25 1.68 2.00
26 1.62 2.08
27 1.56 2.07
28 1.50 2.14
29 1.45 2.14
30 1.40 2.20
31 1.29 2.13
32 0.69 2.12
33 1.27 2.12
But note that for example the value for 32 byte is much better than the value for 31. The difference between 15 and 16 is even larger! So padding to multiples of 8 may make really sense. And I think when each user may do its own padding that is not a good solution.
For your paper: I was referring to https://arxiv.org/pdf/1503.03465v8.pdf figure 1. x-axis is labeled "data input size (bytes) and y-axis is "cycles per input byte". That curve is absolutely smooth for your clhash, and cycles per input byte starts at 2.5 for 8 byte size. So my assumption is that your current code is more tuned to large blocks?
And finally, it would be great to have additional a streaming interface as xxhash provides. For example when one wants to hash records with fields like name and country. I guess for the current interface one has to join the two strings and then hash it.
@StefanSalewski These are good ideas. Looking forward to receiving the pull requests!