valkey Improve slot migration reliability

Improve slot migration reliability

Open PingXie opened this issue 10 months ago • 1 comments

Re-sharding in OSS Redis clusters is a risky operation, lacking both high availability and eventual consistency in the design. This is a pain point for many users. Mitigations were previously discussed (see pull request https://github.com/redis/redis/pull/10517), and I believe it's important to resurrect this conversation within this fork. These mitigations could provide much-needed relief while we work towards a more robust long-term solution. I'd like to hear the community's thoughts on the feasibility of these mitigations and how they could benefit the users.

@zuiderkwast @soloestoy @madolson

Mar 25 '24 03:03 PingXie

Definitely. Another important point is that reading from replicas is broken. A replica doesn't know about ongoing migrations, so it can't return ASK redirects.

The only thing we didn't agree about is whether the SETSLOT can be replicated in the replication stream or if it should be done in the cluster bus.

If it's done in the replication stream, the ordering of SETSLOT and the commands executed before SETSLOT and after SETSLOT all come the right order. We can't achieve that with the cluster bus.

Mar 25 '24 09:03 zuiderkwast

valkey valkey copied to clipboard

Improve slot migration reliability

valkey
valkey copied to clipboard