valkey
valkey copied to clipboard
Improve slot migration reliability
Re-sharding in OSS Redis clusters is a risky operation, lacking both high availability and eventual consistency in the design. This is a pain point for many users. Mitigations were previously discussed (see pull request https://github.com/redis/redis/pull/10517), and I believe it's important to resurrect this conversation within this fork. These mitigations could provide much-needed relief while we work towards a more robust long-term solution. I'd like to hear the community's thoughts on the feasibility of these mitigations and how they could benefit the users.
@zuiderkwast @soloestoy @madolson
Definitely. Another important point is that reading from replicas is broken. A replica doesn't know about ongoing migrations, so it can't return ASK redirects.
The only thing we didn't agree about is whether the SETSLOT can be replicated in the replication stream or if it should be done in the cluster bus.
If it's done in the replication stream, the ordering of SETSLOT and the commands executed before SETSLOT and after SETSLOT all come the right order. We can't achieve that with the cluster bus.