RedisCluster is unable to update the slot mapping after a new node has been added
File: redis_cluster.hpp
template <typename Cmd, typename ...Args>
ReplyUPtr RedisCluster::_command(Cmd cmd, const StringView &key, Args &&...args) {
for (auto idx = 0; idx < 2; ++idx) {
try {
auto pool = _pool.fetch(key);
...
} catch (const IoError &err) {
...
} catch (const ClosedError &err) {
...
} catch (const MovedError &err) {
...
} catch (const AskError &err) {
...
} // For other exceptions, just throw it.
}
The "try" block calls: _pool.fetch --> ShardsPool::fetch --> ShardsPool::_fetch --> ShardsPool::_get_pool --> throw Error("Slot is out of range: " + std::to_string(slot));
Since the exception is just normal, it will not be caught by any catch block above. As the consequence, the RedisCluster is not updated with new slot.
If it throws "Slot is out of range", it means that some slots are not covered by any node. If you cluster is in a healthy state, it should not happen.
How did you add a new node? It seems that your cluster is in an unhealthy state, i.e. some slots are not covered by any node, before you add a new node to the cluster. Regards
Well, I'm using "cluster-require-full-coverage no" for all nodes. When using this option, the doc said "If the option is set to no, the cluster will still serve queries even if only requests about a subset of keys can be processed" link
This is my cluster config:
port 50001
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no
appendonly yes
Initially, I started two nodes 50001 and 50002. Then I started RedisCluster client (read/write against these two nodes without any issues). After that, the node 50003 is started and joined the cluster. The cluster_state is ok all the time. The same RedisCluster client throws "Slot is out of range" for reading/writing against node 50003.
127.0.0.1:50001> cluster nodes
3e12a03d29124037d5a4c23621eb794c6dae5e16 127.0.0.1:50001@60001 myself,master - 0 1627712748000 0 connected 9490
2af128457eeff2184edead270dcf234b0733baa2 127.0.0.1:50003@60003 master - 0 1627712748487 2 connected 1360
10d2bb9ccb96d30059c7e7e9679f6b979ca0d9d2 127.0.0.1:50002@60002 master - 0 1627712749489 1 connected 5489
127.0.0.1:50001> cluster slots
1) 1) (integer) 9490
2) (integer) 9490
3) 1) "127.0.0.1"
2) (integer) 50001
3) "3e12a03d29124037d5a4c23621eb794c6dae5e16"
2) 1) (integer) 1360
2) (integer) 1360
3) 1) "127.0.0.1"
2) (integer) 50003
3) "2af128457eeff2184edead270dcf234b0733baa2"
3) 1) (integer) 5489
2) (integer) 5489
3) 1) "127.0.0.1"
2) (integer) 50002
3) "10d2bb9ccb96d30059c7e7e9679f6b979ca0d9d2"
127.0.0.1:50001> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:3
cluster_slots_ok:3
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:2
cluster_my_epoch:0
cluster_stats_messages_ping_sent:37
cluster_stats_messages_pong_sent:35
cluster_stats_messages_fail_sent:1
cluster_stats_messages_sent:73
cluster_stats_messages_ping_received:35
cluster_stats_messages_pong_received:35
cluster_stats_messages_received:70
127.0.0.1:50001>
Thanks for sharing your scenario. I fixed the problem on branch slot-coverage. You can check out this branch, and have a try.
However, since slot mapping update is not a cheap operation, once you access some slot that not covered, you have to update the mapping. I'm not sure if this is a good idea. I'll take a deep look into it before merging the code into master.
Regards.
Thank you very much for the quick solution, it is working I would say. Definitely yes, I agree with you that it is not a cheap operation. Either we can do it, or somehow detect the newly added node and update the mapping only one time.
Hi @sewenew Have you consider merging this feature to the master branch? I have also found one more issue, it is about pipeline in RedisCluster. As my observation, the GET/SET command works perfect when we introduce a new hash slot to redis-server (e.g update the slot mapping if needed). However, the RedisCluster::pipeline do not aware that a new hash slot is added into the redis-cluster. This issue happens on both latest slot-coverage and master branches.
However, the RedisCluster::pipeline do not aware that a new hash slot is added into the redis-cluster.
The Pipeline object does not aware the change of slots by design. Because it keeps a single connection to one node of Redis Cluster, and does not handle MOVED or ASK redirection error.
In your case, I'd suggest that you create a new Pipeline object when you need to use pipeline, instead of creating a Pipeline object, and use it all the time. So that when the slots mapping changes, the newly created Pipeline object will connect to the right node. Also, in order to avoid the performance penalty, you can create pipeline with a connection picked from the underlying connection pool.
void func(RedisCluster &cluster) {
auto pipe = cluster.pipeline("hash_tag", false); // <----- create Pipeline when you use it.
// use pipeline
}
Check this for more details on how to create pipeline without creating new connection.
Have you consider merging this feature to the master branch?
I need to do some more check on this feature. However, I'm sorry, but too busy these days. If everything is done, I'll let you know.
Regards
This problem has been fixed. When some slot is uncovered, redis-plus-plus update the slot-node mapping asynchronously. So that it makes the operation less expensive, and keep the slot-node mapping updated.
Regards
Sounds very good, thanks!