redis-rb
redis-rb copied to clipboard
Redis::Distributed's shard distribution doesn't seem very even?
Hi there - we've got a couple of redis nodes set up as a cache for Rail's RedisCacheStore. I noticed that one of these nodes was receiving much more traffic than the other.
It seems like the default sharding doesn't balance between two nodes very well. With something like this -
count = Hash.new(0)
r = Redis::Distributed.new([{ url: 'redis://1.1.1.1:6379/0' }, { url: 'redis://2.2.2.2:6379/0'}], driver: :hiredis)
100000.times {
key = SecureRandom.hex
node = r.node_for(key)
count[node.instance_variable_get("@options")[:url]] += 1
}
count
=> {"redis://2.2.2.2:6379/0"=>62322, "redis://1.1.1.1:6379/0"=>37678}
so here the 2.2.2.2 node was chosen 40% more often that the 1.1.1.1. The key (SecureRandom.hex
) here is obviously pretty artificial, but the ratio seems to match the extra traffic I'm seeing hitting one of my nodes with Rails' actionview keys.
Is this expected behaviour, or is something weird going on?
@jdelStrother I was able to reproduce.
It all surely depends on the urls and cache keys, so sometimes you can get ~ even distribution, sometimes 60/40 or 70/30. For 2 nodes, no matter which urls I used, I almost always got the distribution similar from your example. For 3 nodes a little better, but sometimes also a big skewing.
One approach to improve the distribution, is to increase the number of points for each server. Something like:
ring = Redis::HashRing.new([], 1024)
r = Redis::Distributed.new([{ url: 'redis://1.1.1.1:6379/0' }, { url: 'redis://2.2.2.2:6379/0'}], ring: ring)
I investigated how dalli
implements the distributed client. The approach is basically the same (160 points, crc32, etc), except they use a little different hashing for server nodes: https://github.com/petergoldstein/dalli/blob/4f6ffac1e1b19222a42619669d32f42701277941/lib/dalli/ring.rb#L133-L134
I tried to apply this change to Redis::HashRing
and run your example. And I saw almost even distribution all the time! 🤔 I ran It for lots of combinations and different urls and different numbers of them. I am not strong in cryptography, so I do not know why is that.
If this is a good approach, the hashing algorithm probably can be changed. But this will also lead to a lot of cache misses when deployed for the first time.
@byroot Do you have any thoughts on this?
If this is a good approach, the hashing algorithm probably can be changed. But this will also lead to a lot of cache misses when deployed for the first time.
Since I'm working on the next major, it might be a good time to make such a change.
Redis::Distributed
is a very old feature, long before I took over as maintainer, so I don't know why this algo in particular was chosen, whether it's arbitrary or if it follow some kind of spec other redis client follow too? Would be good to check other langauges clients to see what they do
So from the list https://redis.io/docs/clients/ I found 2 implementations of consistent hashing:
php: https://github.com/predis/predis/blob/main/src/Cluster/Distributor/HashRing.php - for hashing server nodes uses custom user-defined functions or .to_s
, if none. For keys - crc32.
go: https://github.com/go-redis/redis/blob/master/ring.go - for hashing server nodes and keys uses xxhash
(http://cyan4973.github.io/xxHash/).
I searched through the history of this gem and found that originally hashing was done by md5
, but then changed to crc32
in https://github.com/redis/redis-rb/commit/5cd3a39e525b32923ea1315dfcf697dcc7d586ec. Maybe because crc32
is faster to compute?
That change was done 13 years ago, but maybe you remember why, @mperham? 😄
Ok, so there's no standard to adhere to. So we can probably change it in a major without any issue.
I believe that’s code borrowed from Dalli as I never worked on redis-rb back then that I can recall.
I’m pretty sure that I chose crc32 because it was easy to access (being in stdlib) and very fast. You are right that it is not particularly well suited for this use case. I’d recommend surveying a few more clients and choosing a real random hashing scheme. md5 should be fast and a reasonable choice.
@byroot So do you think we should change hashing only for server keys (so sha1
/md5
for servers and crc32
for cache keys, like in dalli) or for both servers keys and cache keys?
That’s your call but I’m pretty sure crc32 isn’t well suited for either use case.
@fatkodima I'd go with MD5 for everything as it's likely the best perf / distribution ration available in the stdlib. We should probably make that configurable though. e.g. accept a callable, or simply call on common interface like #digest
or #hexdigest
.