hashbrown icon indicating copy to clipboard operation
hashbrown copied to clipboard

Skip `match_empty` after failed to find a insert slot in group

Open ldm0 opened this issue 8 months ago • 4 comments
trafficstars

Running benchmark on my local macbook pro(M1 Pro):

Before:

Before:

running 41 tests
test clone_from_large               ... bench:       7,630.54 ns/iter (+/- 105.82)
test clone_from_small               ... bench:          69.27 ns/iter (+/- 4.44)
test clone_large                    ... bench:       7,650.92 ns/iter (+/- 138.79)
test clone_small                    ... bench:         110.63 ns/iter (+/- 4.04)
test grow_insert_foldhash_highbits  ... bench:      21,677.08 ns/iter (+/- 1,188.93)
test grow_insert_foldhash_random    ... bench:      24,624.66 ns/iter (+/- 673.04)
test grow_insert_foldhash_serial    ... bench:      22,303.47 ns/iter (+/- 1,107.11)
test grow_insert_std_highbits       ... bench:      37,461.35 ns/iter (+/- 1,110.84)
test grow_insert_std_random         ... bench:      37,744.47 ns/iter (+/- 1,511.93)
test grow_insert_std_serial         ... bench:      37,440.62 ns/iter (+/- 618.89)
test insert_erase_foldhash_highbits ... bench:      15,511.28 ns/iter (+/- 367.55)
test insert_erase_foldhash_random   ... bench:      17,365.08 ns/iter (+/- 1,125.38)
test insert_erase_foldhash_serial   ... bench:      15,461.06 ns/iter (+/- 738.03)
test insert_erase_std_highbits      ... bench:      31,643.75 ns/iter (+/- 1,053.08)
test insert_erase_std_random        ... bench:      33,238.85 ns/iter (+/- 1,722.66)
test insert_erase_std_serial        ... bench:      31,921.90 ns/iter (+/- 1,340.85)
test insert_foldhash_highbits       ... bench:      14,680.70 ns/iter (+/- 545.13)
test insert_foldhash_random         ... bench:      14,706.28 ns/iter (+/- 290.97)
test insert_foldhash_serial         ... bench:      14,810.04 ns/iter (+/- 362.32)
test insert_std_highbits            ... bench:      19,833.44 ns/iter (+/- 786.51)
test insert_std_random              ... bench:      19,728.32 ns/iter (+/- 277.17)
test insert_std_serial              ... bench:      19,674.22 ns/iter (+/- 636.33)
test iter_foldhash_highbits         ... bench:         902.91 ns/iter (+/- 66.87)
test iter_foldhash_random           ... bench:         784.51 ns/iter (+/- 29.84)
test iter_foldhash_serial           ... bench:         777.75 ns/iter (+/- 49.07)
test iter_std_highbits              ... bench:         762.85 ns/iter (+/- 14.56)
test iter_std_random                ... bench:         774.44 ns/iter (+/- 24.61)
test iter_std_serial                ... bench:         762.94 ns/iter (+/- 12.79)
test lookup_fail_foldhash_highbits  ... bench:       1,899.07 ns/iter (+/- 67.99)
test lookup_fail_foldhash_random    ... bench:       2,249.00 ns/iter (+/- 129.37)
test lookup_fail_foldhash_serial    ... bench:       2,602.64 ns/iter (+/- 56.42)
test lookup_fail_std_highbits       ... bench:       8,734.81 ns/iter (+/- 273.01)
test lookup_fail_std_random         ... bench:       8,594.13 ns/iter (+/- 507.16)
test lookup_fail_std_serial         ... bench:       8,625.80 ns/iter (+/- 145.82)
test lookup_foldhash_highbits       ... bench:       2,098.07 ns/iter (+/- 49.03)
test lookup_foldhash_random         ... bench:       2,455.68 ns/iter (+/- 87.79)
test lookup_foldhash_serial         ... bench:       2,089.30 ns/iter (+/- 111.22)
test lookup_std_highbits            ... bench:       8,784.86 ns/iter (+/- 293.86)
test lookup_std_random              ... bench:       8,740.02 ns/iter (+/- 119.12)
test lookup_std_serial              ... bench:       8,604.05 ns/iter (+/- 472.75)
test rehash_in_place                ... bench:     209,355.86 ns/iter (+/- 6,211.48)

test result: ok. 0 passed; 0 failed; 0 ignored; 41 measured; 0 filtered out; finished in 55.76s

     Running benches/insert_unique_unchecked.rs (target/release/deps/insert_unique_unchecked-70b56273fe907210)

running 2 tests
test insert                  ... bench:       9,978.20 ns/iter (+/- 218.31)
test insert_unique_unchecked ... bench:       3,453.39 ns/iter (+/- 53.06)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 2.10s

     Running benches/set_ops.rs (target/release/deps/set_ops-41e9dbeede3fedd5)

running 10 tests
test set_ops_bit_and                ... bench:      10,482.35 ns/iter (+/- 387.65)
test set_ops_bit_and_assign         ... bench:       6,106.08 ns/iter (+/- 247.88)
test set_ops_bit_or                 ... bench:      86,940.28 ns/iter (+/- 2,392.71)
test set_ops_bit_or_assign          ... bench:      53,571.67 ns/iter (+/- 1,050.82)
test set_ops_bit_xor                ... bench:      88,900.81 ns/iter (+/- 3,737.91)
test set_ops_bit_xor_assign         ... bench:      55,736.53 ns/iter (+/- 1,031.37)
test set_ops_sub_assign_large_small ... bench:      55,513.89 ns/iter (+/- 1,965.35)
test set_ops_sub_assign_small_large ... bench:       6,100.57 ns/iter (+/- 255.82)
test set_ops_sub_large_small        ... bench:      87,790.68 ns/iter (+/- 2,284.42)
test set_ops_sub_small_large        ... bench:         647.65 ns/iter (+/- 5.28)

test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured; 0 filtered out; finished in 19.65s
After:
running 41 tests
test clone_from_large               ... bench:       7,708.37 ns/iter (+/- 317.98)
test clone_from_small               ... bench:          69.86 ns/iter (+/- 1.66)
test clone_large                    ... bench:       7,662.61 ns/iter (+/- 153.91)
test clone_small                    ... bench:         110.04 ns/iter (+/- 3.01)
test grow_insert_foldhash_highbits  ... bench:      20,067.66 ns/iter (+/- 813.59)
test grow_insert_foldhash_random    ... bench:      24,074.16 ns/iter (+/- 1,134.52)
test grow_insert_foldhash_serial    ... bench:      21,096.95 ns/iter (+/- 366.22)
test grow_insert_std_highbits       ... bench:      36,934.55 ns/iter (+/- 894.28)
test grow_insert_std_random         ... bench:      36,898.65 ns/iter (+/- 1,328.19)
test grow_insert_std_serial         ... bench:      36,811.75 ns/iter (+/- 1,086.38)
test insert_erase_foldhash_highbits ... bench:      15,791.87 ns/iter (+/- 473.97)
test insert_erase_foldhash_random   ... bench:      16,973.07 ns/iter (+/- 934.27)
test insert_erase_foldhash_serial   ... bench:      17,217.81 ns/iter (+/- 1,168.29)
test insert_erase_std_highbits      ... bench:      33,779.54 ns/iter (+/- 1,145.91)
test insert_erase_std_random        ... bench:      34,539.58 ns/iter (+/- 743.19)
test insert_erase_std_serial        ... bench:      33,910.30 ns/iter (+/- 5,716.08)
test insert_foldhash_highbits       ... bench:      14,719.39 ns/iter (+/- 467.13)
test insert_foldhash_random         ... bench:      15,091.88 ns/iter (+/- 12,584.39)
test insert_foldhash_serial         ... bench:      14,592.95 ns/iter (+/- 652.87)
test insert_std_highbits            ... bench:      19,681.62 ns/iter (+/- 728.00)
test insert_std_random              ... bench:      19,588.85 ns/iter (+/- 618.96)
test insert_std_serial              ... bench:      19,611.61 ns/iter (+/- 792.86)
test iter_foldhash_highbits         ... bench:         828.32 ns/iter (+/- 27.00)
test iter_foldhash_random           ... bench:         768.69 ns/iter (+/- 55.53)
test iter_foldhash_serial           ... bench:         766.07 ns/iter (+/- 9.61)
test iter_std_highbits              ... bench:         765.31 ns/iter (+/- 18.42)
test iter_std_random                ... bench:         765.99 ns/iter (+/- 45.18)
test iter_std_serial                ... bench:         763.18 ns/iter (+/- 15.77)
test lookup_fail_foldhash_highbits  ... bench:       1,936.33 ns/iter (+/- 47.30)
test lookup_fail_foldhash_random    ... bench:       2,425.87 ns/iter (+/- 99.09)
test lookup_fail_foldhash_serial    ... bench:       2,127.12 ns/iter (+/- 67.47)
test lookup_fail_std_highbits       ... bench:       8,581.58 ns/iter (+/- 279.54)
test lookup_fail_std_random         ... bench:       8,333.45 ns/iter (+/- 114.58)
test lookup_fail_std_serial         ... bench:       8,097.86 ns/iter (+/- 185.79)
test lookup_foldhash_highbits       ... bench:       2,172.82 ns/iter (+/- 43.31)
test lookup_foldhash_random         ... bench:       2,370.67 ns/iter (+/- 43.23)
test lookup_foldhash_serial         ... bench:       2,066.38 ns/iter (+/- 113.00)
test lookup_std_highbits            ... bench:       8,777.33 ns/iter (+/- 266.92)
test lookup_std_random              ... bench:       8,688.57 ns/iter (+/- 277.06)
test lookup_std_serial              ... bench:       8,676.47 ns/iter (+/- 196.41)
test rehash_in_place                ... bench:     191,902.86 ns/iter (+/- 3,842.01)

test result: ok. 0 passed; 0 failed; 0 ignored; 41 measured; 0 filtered out; finished in 50.84s

     Running benches/insert_unique_unchecked.rs (target/release/deps/insert_unique_unchecked-70b56273fe907210)

running 2 tests
test insert                  ... bench:       4,811.63 ns/iter (+/- 36.77)
test insert_unique_unchecked ... bench:       3,464.63 ns/iter (+/- 109.34)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 1.11s

     Running benches/set_ops.rs (target/release/deps/set_ops-41e9dbeede3fedd5)

running 10 tests
test set_ops_bit_and                ... bench:       8,704.29 ns/iter (+/- 204.07)
test set_ops_bit_and_assign         ... bench:       6,075.31 ns/iter (+/- 137.78)
test set_ops_bit_or                 ... bench:      68,336.41 ns/iter (+/- 2,494.32)
test set_ops_bit_or_assign          ... bench:      53,319.49 ns/iter (+/- 1,172.32)
test set_ops_bit_xor                ... bench:      73,222.35 ns/iter (+/- 3,848.50)
test set_ops_bit_xor_assign         ... bench:      55,755.47 ns/iter (+/- 2,032.17)
test set_ops_sub_assign_large_small ... bench:      55,242.62 ns/iter (+/- 1,864.93)
test set_ops_sub_assign_small_large ... bench:       6,130.91 ns/iter (+/- 130.80)
test set_ops_sub_large_small        ... bench:      73,938.60 ns/iter (+/- 2,211.95)
test set_ops_sub_small_large        ... bench:         626.02 ns/iter (+/- 19.25)

Highlights:

- test insert                  ... bench:       9,978.20 ns/iter (+/- 218.31)
+ test insert                  ... bench:       4,811.63 ns/iter (+/- 36.77)
- test set_ops_bit_and                ... bench:      10,482.35 ns/iter (+/- 387.65)
+ test set_ops_bit_and                ... bench:       8,704.29 ns/iter (+/- 204.07)
- test set_ops_sub_large_small        ... bench:      87,790.68 ns/iter (+/- 2,284.42)
+ test set_ops_sub_large_small        ... bench:      73,938.60 ns/iter (+/- 2,211.95)

Previously, each group would be matched twice(first by find_insert_slot_in_group, which finds empty or deleted slot in the group, and then again by match_empty, which finds empty slot in the group) until a deleted slot was found in the table. The second match is clearly redundant.

By removing the redundant check, we significantly improve the insertion speed and query speed in a newly created HashMap/HashSet.

ldm0 avatar Feb 26 '25 19:02 ldm0