icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Add initial ZeroHashMap

Open pdogr opened this issue 3 years ago • 7 comments

#2532 Adds core functionality of ZeroHashMap. Perfect hashes are computed using CHD algorithm, with aHash being used as the hashing function.

pdogr avatar Sep 19 '22 03:09 pdogr

Praise: This looks like the type of hash map impl that will work well in ICU4X. It looks algorithmically similar to the approach rkyv is taking.

I would like to see benches against ZeroMap before merging, both data size and lookup performance, and code size for bonus points.

Yes I based my implementation on rkyv's hashmap with a few changes. Both use CHD algorithm with the random hash function approach. There is a practical approach mentioned in Appendix A of the paper which uses a simpler hash but computes key hash only once as compared to twice for the random hash function approach.

Changes from rkyv impl

  1. Different hash function
  2. Normally we would have stored the value inline at the slot, but here we store index to original placement of (k, v).
  3. assignments.contains(&index) in rkyv takes O(bucket_chain_len) which might slow hash building a bit (I haven't benchmarked it) depending on the bucket_chain len. So we use generation trick to get contains fast.

rkyv does have one additional optimization for the case when the chain has only one bucket. Instead of storing the seed in this case, the original index will be stored in the displacement array and the seeds will have the highest bit set to 1. As this original index will have the highest bit 0, these won't collide.

pdogr avatar Sep 19 '22 06:09 pdogr

Sounds good. When optimizing, please focus on (1) data size and (2) lookup speed. I don't care too much about building speed since it is done offline.

sffc avatar Sep 19 '22 07:09 sffc

Added some benchmarks reusing the data from the existing ones. Results

zeromap/lookup/large    time:   [188.63 ns 188.94 ns 189.27 ns]
                        change: [-2.8188% -2.3026% -1.7709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

zerohashmap/lookup/large
                        time:   [82.513 ns 82.638 ns 82.766 ns]
                        change: [-1.1974% -0.7730% -0.3681%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

zeromap/lookup/large/hashmap
                        time:   [65.359 ns 65.476 ns 65.601 ns]
                        change: [-1.3054% -0.8122% -0.3220%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

A few tradeoffs and observations

  1. Almost half of the time is used in the hash computation, I'll try "Practical approach" given in the above paper to see if it shows improvement.
  2. FlexZeroVec vs ZeroVec for HashIndex entries (size vs lookup time)
  3. Reverse mapping. Ideally we want to store key, value together as (K, V) which would remove the need for reverse mapping and also better locality.
  4. Hash function. These benches are using ahash for now. I will do some experiments with platform independent hashes.

pdogr avatar Sep 20 '22 07:09 pdogr

@pdogr Good data on the large map. How does the performance compare on the small version of the map?

sffc avatar Sep 20 '22 17:09 sffc

@pdogr Good data on the large map. How does the performance compare on the small version of the map?

For smaller keys hash computation time really comes into picture

zeromap/lookup/small    time:   [52.080 ns 52.323 ns 52.614 ns]
                        change: [+5.7258% +6.4265% +7.0869%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

zerohashmap/lookup/small
                        time:   [98.832 ns 99.025 ns 99.231 ns]
                        change: [-0.8245% -0.3353% +0.1726%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

zeromap/lookup/small/hashmap
                        time:   [65.108 ns 65.336 ns 65.587 ns]
                        change: [-5.1179% -4.3311% -3.5866%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

pdogr avatar Sep 20 '22 17:09 pdogr

FlexZeroVec vs ZeroVec

Using ZeroVec improves the lookup time by ~ 11% (99ns -> 87ns) for small lookups and ~25% (82ns -> 64ns) for large lookups with the existing algorithm.

pdogr avatar Sep 20 '22 17:09 pdogr

It seems that another way to architect this could be to make this a standalone type that maps from keys to an index, and then keep a proper ZeroMap as the second stage of lookup.

sffc avatar Sep 20 '22 18:09 sffc

Switched to the practical approach mentioned in that paper with 64 bit (16 bits for g, 24 bits for f0, f1). Changed the hash function to wyhash.

Benches
zeromap/lookup/small    time:   [50.796 ns 50.913 ns 51.042 ns]
                        change: [+0.1679% +0.7275% +1.3576%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

zeromap/lookup/large    time:   [196.49 ns 197.21 ns 198.13 ns]
                        change: [-7.7555% -6.8637% -5.9877%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

zeromap/lookup/small/hashmap
                        time:   [69.802 ns 70.089 ns 70.439 ns]
                        change: [+2.9376% +3.6548% +4.3437%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  8 (8.00%) high mild

zeromap/lookup/large/hashmap
                        time:   [66.232 ns 66.464 ns 66.749 ns]
                        change: [+0.4782% +1.0997% +1.7342%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  6 (6.00%) high severe

zerohashmap/lookup/small
                        time:   [45.760 ns 45.844 ns 45.928 ns]
                        change: [-1.8354% -1.3624% -0.9334%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild

zerohashmap/lookup/large
                        time:   [43.285 ns 43.356 ns 43.434 ns]
                        change: [+3.1555% +3.8198% +4.4009%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe


pdogr avatar Sep 23 '22 06:09 pdogr