str icon indicating copy to clipboard operation
str copied to clipboard

best string hash function

Open ktprime opened this issue 2 years ago • 4 comments

the best string hash function for hash map. https://github.com/wangyi-fudan/wyhash

from benchmark your str win is because fixed size string(no heap alloc -> cache friendly) and opt string compare

I add my emhash7(git emhash) into your integer bench:

bench_hash 0 sum: 0 avg lat: 4.8297 bench_hash 1 sum: 0 avg lat: 4.69109 bench_hash 2 sum: 0 avg lat: 3.46806 bench_hash 3 sum: 0 avg lat: 3.35624 bench_hash 4 sum: 0 avg lat: 4.9344 bench_hash 5 sum: 0 avg lat: 1.73721 bench_hash 6 sum: 0 avg lat: 1.64723 std::__1::map<unsigned int, unsigned short>, sum: 2893000 avg lat: 4.69555 std::__1::unordered_map<unsigned int, unsigned short>, sum: 2893000 avg lat: 7.24729 tsl::robin_map<unsigned int, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned int, unsigned short>>, true>, sum: 2893000 avg lat: 1.14586 tsl::hopscotch_map<unsigned int, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned int, unsigned short>>, 10, true>, sum: 2893000 avg lat: 3.34826 robin_hood::detail::Table<true, 80, unsigned int, unsigned short, robin_hood::hash, std::__1::equal_to>, sum: 2893000 avg lat: 3.61231 emhash7::HashMap<unsigned int, unsigned short>, sum: 2893000 avg lat: 1.54262 dense_hash_set, sum: 2893000 avg lat: 7.42706 bench_bsearch sum: 2893000 avg lat: 4.78543

ktprime avatar Jul 03 '21 11:07 ktprime

huangyuingdeMBP:benchmark huangyuanbing$ ./benchfindstr < data.txt bench_hash 0 sum: 44301000 avg lat: 5.00236 bench_hash 1 sum: 44301000 avg lat: 5.60526 bench_hash 2 sum: 44301000 avg lat: 4.78484 bench_hash 3 sum: 44301000 avg lat: 5.33511 bench_hash 4 sum: 44301000 avg lat: 5.94543 bench_hash 5 sum: 44301000 avg lat: 4.34096 bench_map sum: 44301000 avg lat: 21.6678 std::__1::unordered_map<std::__1::basic_string, unsigned short>, sum: 44301000 avg lat: 25.7266 tsl::robin_map<std::__1::basic_string, unsigned short, std::__1::hash<std::__1::basic_string>, std::__1::equal_to<std::__1::basic_string>, std::__1::allocator<std::__1::pair<std::__1::basic_string, unsigned short>>, true>, sum: 44301000 avg lat: 11.5332 tsl::hopscotch_map<std::__1::basic_string, unsigned short, std::__1::hash<std::__1::basic_string>, std::__1::equal_to<std::__1::basic_string>, std::__1::allocator<std::__1::pair<std::__1::basic_string, unsigned short>>, 10, true>, sum: 44301000 avg lat: 15.4038 robin_hood::detail::Table<true, 80, std::__1::basic_string, unsigned short, robin_hood::hash<std::__1::basic_string>, std::__1::equal_to<std::__1::basic_string>>, sum: 44301000 avg lat: 13.7637 emhash7::HashMap<std::__1::basic_string, unsigned short>, sum: 44301000 avg lat: 13.4242 emhash8::HashMap<std::__1::basic_string, unsigned short>, sum: 44301000 avg lat: 11.4544 emhash5::HashMap<std::__1::basic_string, unsigned short>, sum: 44301000 avg lat: 13.92 dense_hash_map, sum: 44301000 avg lat: 22.3428 bench_bsearch sum: 44301000 avg lat: 16.4077 bench_string_bsearch sum: 44301000 avg lat: 51.6053

ktprime avatar Jul 03 '21 14:07 ktprime

Actually there's a little bug in benchfindint.cc where I recently changed IntT type but forgot to change IntLen accordingly. I have fixed it and submitted, now bench_hash<6> should be the fastest.

MengRao avatar Jul 04 '21 02:07 MengRao

I have clone your project(https://github.com/ktprime/str) and add my emhash into bench code(rand input), your solution is not alway fastest if input very large(n1 > 30000).

huangyuingdeMBP:benchmark huangyuanbing$ ./benchfindint 40001 123456 n1 = 40001 n2 = 123456 bench_hash 6 sum: 202587583000 avg lat: 11.9826 std::__1::unordered_map<unsigned short, unsigned short>, sum: 11272000 avg lat: 24.4841 factor 0.999466 tsl::robin_map<unsigned short, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned short, unsigned short>>, true>, sum: 11272000 avg lat: 5.50371 factor 0.456711 tsl::hopscotch_map<unsigned short, unsigned short, std::__1::hash, std::__1::equal_to, std::__1::allocator<std::__1::pair<unsigned short, unsigned short>>, 10, true>, sum: 11272000 avg lat: 5.18199 factor 0.456711 robin_hood::detail::Table<true, 80, unsigned short, unsigned short, robin_hood::hash, std::__1::equal_to>, sum: 11272000 avg lat: 7.82956 factor 0.456711 emhash5::HashMap<unsigned short, unsigned short>, sum: 11272000 avg lat: 5.60857 factor 0.456711 emhash7::HashMap<unsigned short, unsigned short>, sum: 11272000 avg lat: 4.41366 factor 0.456711 emhash8::HashMap<unsigned short, unsigned short>, sum: 11272000 avg lat: 5.48717 factor 0.456711 dense_hash_set, sum: 202587583000 avg lat: 7.35589 bench_bsearch sum: 220819124800 avg lat: 70.9841

ktprime avatar Jul 04 '21 12:07 ktprime

hyb@HYGPC171201001:/mnt/d/str/benchmark$ ./benchfindint 1001 100000 n1 = 1001 n2 = 100000 bench_hash 0 sum: 153361400 avg lat: 6.16645 bench_hash 1 sum: 153361400 avg lat: 5.96371 bench_hash 2 sum: 153361400 avg lat: 6.63877 bench_hash 3 sum: 153361400 avg lat: 6.28237 bench_hash 4 sum: 153361400 avg lat: 8.34494 bench_hash 5 sum: 153361400 avg lat: 6.3268 bench_hash 6 sum: 153361400 avg lat: 4.50323 std::unordered_map<short unsigned int, short unsigned int>, sum: 302200 avg lat: 16.8496 factor 0.967992 tsl::robin_map<short unsigned int, short unsigned int>, sum: 302200 avg lat: 8.63683 factor 0.487305 tsl::hopscotch_map<short unsigned int, short unsigned int>, sum: 302200 avg lat: 7.28711 factor 0.487305 robin_hood::detail::Table<true, 80, short unsigned int, short unsigned int, robin_hood::hash, std::equal_to >, sum: 302200 avg lat: 3.91013 factor 0.487305 phmap::flat_hash_map<short unsigned int, short unsigned int>, sum: 302200 avg lat: 3.2078 factor 0.487543 emhash5::HashMap<short unsigned int, short unsigned int>, sum: 302200 avg lat: 7.19306 factor 0.487305 emhash7::HashMap<short unsigned int, short unsigned int>, sum: 302200 avg lat: 6.95231 factor 0.487305 emhash8::HashMap<short unsigned int, short unsigned int>, sum: 302200 avg lat: 6.8073 factor 0.487305 dense_hash_set, sum: 153361400 avg lat: 11.1664

small n1 phmap(absl::flat_hash_map) is the fastest.

ktprime avatar Jul 04 '21 13:07 ktprime