redis-rb
redis-rb copied to clipboard
Change hashing to `MD5` in `Redis::Distributed`
Closes #1089.
I verified, md5 is indeed faster than sha1 by 20%.
We should probably make that configurable though. e.g. accept a callable, or simply call on common interface like #digest or #hexdigest
The callable will provide the most flexibility. But do we need it? Implemented accepting an object with #hexdigest
interface as a hasher in this PR.
I rerun my benchmarks from https://github.com/redis/redis-rb/pull/1124. Before (crc32 for all hashing) it run under 11s, with the new changes - 35s.
I also ran benchmarks for hashing algorithms:
# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "benchmark-ips"
end
require "digest/md5"
require "digest/sha1"
require "zlib"
key = "foobarbazquux"
Benchmark.ips do |x|
x.report("md5") { Digest::MD5.hexdigest(key) }
x.report("sha1") { Digest::SHA1.hexdigest(key) }
x.report("crc32") { Zlib.crc32(key) }
x.compare!
end
Comparison:
crc32: 8028674.5 i/s
md5: 708125.2 i/s - 11.34x (± 0.00) slower
sha1: 590949.0 i/s - 13.59x (± 0.00) slower
So there is a considerable slowdown. Maybe it is better still to use crc32 for the cache keys, like in dalli? I tested lots of combinations and got ~ even distribution with it.
Can you benchmark a little higher on the stack? Yes MD5 is much slower than CRC32, but does that have visible impact on a full cache read?
Another thing I'd like to see is distribution, would be interesting to hash a large number of random keys and see how well they are split in between servers.
And finally I wonder how much of the speed difference is because of the hexdigest interface, I wish Digest had an interface to return an Integer (will probably open a feature request for this).
Tested on 1m keys setting and then getting.
For md5 && crc32
: times in range 175-180s
.
For md5
only: times in range 190-195s
.
And finally I wonder how much of the speed difference is because of the hexdigest interface.
md5 && crc32
: ~1.3s
md5
only: ~7.3s
CPU profile
md5 && crc32
==================================
Mode: wall(1000)
Samples: 187889 (0.01% miss rate)
GC: 12312 (6.55%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
142799 (76.0%) 142799 (76.0%) IO#wait_readable
9669 (5.1%) 9669 (5.1%) (marking)
8442 (4.5%) 8442 (4.5%) IO#read_nonblock
167483 (89.1%) 3484 (1.9%) Redis::Client#logging
2639 (1.4%) 2639 (1.4%) (sweeping)
2309 (1.2%) 2309 (1.2%) String#to_s
1969 (1.0%) 1969 (1.0%) IO#write_nonblock
1783 (0.9%) 1783 (0.9%) String#match?
1621 (0.9%) 1621 (0.9%) Zlib.crc32
1233 (0.7%) 1233 (0.7%) Redis::Connection::Ruby#connected?
153118 (81.5%) 1079 (0.6%) Redis::Connection::SocketMixin#_read_from_socket
1051 (0.6%) 1051 (0.6%) Redis::HashRing#binary_search
163995 (87.3%) 989 (0.5%) Redis::Client#ensure_connected
840 (0.4%) 840 (0.4%) String#index
824 (0.4%) 824 (0.4%) String#initialize
552 (0.3%) 552 (0.3%) Kernel#is_a?
458 (0.2%) 458 (0.2%) String#slice!
448 (0.2%) 448 (0.2%) Redis::Client#inherit_socket?
442 (0.2%) 442 (0.2%) Array#first
411 (0.2%) 411 (0.2%) Array#join
326 (0.2%) 326 (0.2%) Symbol#===
md5 only
==================================
Mode: wall(1000)
Samples: 204766 (0.02% miss rate)
GC: 14392 (7.03%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
152968 (74.7%) 152968 (74.7%) IO#wait_readable
10906 (5.3%) 10906 (5.3%) (marking)
9421 (4.6%) 9421 (4.6%) IO#read_nonblock
3547 (1.7%) 3547 (1.7%) String#to_s
3481 (1.7%) 3481 (1.7%) (sweeping)
2502 (1.2%) 2502 (1.2%) IO#write_nonblock
1986 (1.0%) 1986 (1.0%) String#<=>
1740 (0.8%) 1740 (0.8%) Digest::Class#initialize
1577 (0.8%) 1577 (0.8%) String#match?
1423 (0.7%) 1423 (0.7%) Digest::Base#finish
6150 (3.0%) 1386 (0.7%) Digest::Class.hexdigest
3452 (1.7%) 1331 (0.7%) Redis::HashRing#binary_search
921 (0.4%) 921 (0.4%) Digest::Base#reset
174734 (85.3%) 893 (0.4%) Monitor#synchronize
164064 (80.1%) 854 (0.4%) Redis::Connection::SocketMixin#_read_from_socket
831 (0.4%) 831 (0.4%) String#initialize
830 (0.4%) 830 (0.4%) String#index
735 (0.4%) 735 (0.4%) Kernel#is_a?
544 (0.3%) 544 (0.3%) Digest::Base#update
433 (0.2%) 433 (0.2%) String#slice!
Another thing I'd like to see is distribution, would be interesting to hash a large number of random keys and see how well they are split in between servers.
# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "redis", path: "~/Desktop/oss/redis-rb"
end
require 'redis'
require 'redis/distributed'
URLS = 10.times.map { "redis://#{rand(256)}.#{rand(256)}.#{rand(256)}.#{rand(256)}:6379/#{rand(16)}" }
KEYS = 1_000_000.times.map { SecureRandom.hex }
NUM = 2
nums = []
20.times do
urls = URLS.shuffle.first(NUM)
r = Redis::Distributed.new(urls.map { |url| { url: url } })
count = Hash.new(0)
KEYS.each do |key|
node = r.node_for(key)
count[node.instance_variable_get("@options")[:url]] += 1
end
nums.concat(count.values)
puts count.values.inspect
end
sum = (nums.map { |num| (num - 1_000_000 / NUM) * (num - 1_000_000 / NUM) }).sum
puts "Deviation = #{Math.sqrt(sum / nums.size)}"
md5 && crc32 2 nodes
[506634, 493366]
[487947, 512053]
[474820, 525180]
[533974, 466026]
[535523, 464477]
[474820, 525180]
[563337, 436663]
[488534, 511466]
[510886, 489114]
[493590, 506410]
[504042, 495958]
[551065, 448935]
[503047, 496953]
[471883, 528117]
[553775, 446225]
[540300, 459700]
[515538, 484462]
[535523, 464477]
[462318, 537682]
[471883, 528117]
Deviation = 31487.757843326985
md5 only 2 nodes
[526809, 473191]
[464446, 535554]
[489952, 510048]
[508237, 491763]
[517752, 482248]
[531435, 468565]
[498762, 501238]
[502919, 497081]
[481230, 518770]
[538380, 461620]
[508886, 491114]
[498762, 501238]
[519862, 480138]
[499878, 500122]
[501956, 498044]
[515057, 484943]
[544460, 455540]
[507406, 492594]
[544460, 455540]
[502919, 497081]
Deviation = 22374.765294858404
md5 && crc32 3 nodes
[358759, 326872, 314369]
[339278, 322779, 337943]
[309145, 398467, 292388]
[374854, 317106, 308040]
[352086, 353461, 294453]
[294223, 332031, 373746]
[334724, 303329, 361947]
[329826, 322949, 347225]
[339295, 350785, 309920]
[338708, 309489, 351803]
[319351, 345146, 335503]
[325813, 342707, 331480]
[361314, 341945, 296741]
[368709, 332172, 299119]
[373903, 338531, 287566]
[355086, 332223, 312691]
[332742, 353169, 314089]
[335199, 336175, 328626]
[329706, 344080, 326214]
[341425, 345928, 312647]
Deviation = 22746.844990020043
md5 only 3 nodes
[299034, 354779, 346187]
[294123, 331776, 374101]
[312311, 371903, 315786]
[338345, 354694, 306961]
[346348, 345278, 308374]
[338345, 354694, 306961]
[319479, 350033, 330488]
[349567, 345307, 305126]
[308663, 361955, 329382]
[346348, 345278, 308374]
[356262, 320359, 323379]
[305001, 355968, 339031]
[310532, 319387, 370081]
[304359, 365485, 330156]
[319479, 350033, 330488]
[362411, 300194, 337395]
[360968, 299656, 339376]
[303046, 379547, 317407]
[315970, 329397, 354633]
[345364, 329647, 324989]
Deviation = 22119.83829054815
md5 && crc32 4 nodes
[272892, 253748, 216995, 256365]
[256027, 226508, 251382, 266083]
[258361, 263032, 255381, 223226]
[256636, 268257, 244793, 230314]
[251753, 261778, 255076, 231393]
[290511, 235542, 218927, 255020]
[269984, 276025, 211788, 242203]
[234014, 243267, 263447, 259272]
[258361, 263032, 255381, 223226]
[260547, 238594, 248297, 252562]
[251645, 268150, 248500, 231705]
[245457, 256799, 253472, 244272]
[223488, 248112, 248952, 279448]
[243528, 229636, 263102, 263734]
[252920, 270880, 247447, 228753]
[253853, 262212, 254541, 229394]
[274092, 242364, 239309, 244235]
[257590, 251302, 244890, 246218]
[245457, 256799, 253472, 244272]
[240302, 270458, 232599, 256641]
Deviation = 15244.604684936898
md5 only 4 nodes
[254966, 270576, 225634, 248824]
[243100, 244509, 283907, 228484]
[232989, 285731, 242077, 239203]
[250775, 234602, 250975, 263648]
[264998, 229018, 266547, 239437]
[236927, 279100, 235980, 247993]
[240976, 288978, 242791, 227255]
[229075, 253212, 249064, 268649]
[267469, 273822, 210042, 248667]
[269264, 246164, 230990, 253582]
[238152, 282396, 234728, 244724]
[260452, 246843, 257994, 234711]
[221405, 249877, 237002, 291716]
[240976, 288978, 242791, 227255]
[218590, 245740, 232345, 303325]
[251621, 242703, 250111, 255565]
[252262, 227209, 254763, 265766]
[234667, 276356, 234576, 254401]
[248171, 241095, 258018, 252716]
[225958, 273203, 245825, 255014]
Deviation = 18600.503030832257
So, performance is slightly better and distribution is ~ similar.
SecureRandom.hex
I don't want to be too picky, but what about the same test but with somewhat similarish keys, e.g: "user:#{numeric_id}"
?
md5 only
2 nodes: Deviation = 22759
3 nodes: Deviation = 21986
4 nodes: Deviation = 16065
md5 && crc32
2 nodes: Deviation = 21151
3 nodes: Deviation = 17539
4 nodes: Deviation = 12988
@fatkodima I think you can now rebase, I think I'm done with most of the big changes.
As for the algorithm selection, I trust you to chose the best compromise.
@byroot
I think I'm done with most of the big changes
Congrats on that work 💪 💪 💪
Rebased and updated this PR.