garnet icon indicating copy to clipboard operation
garnet copied to clipboard

Replicas not attaching correctly on failover

Open vazois opened this issue 1 year ago • 0 comments

NOTE: Currently we do not support leader election on failover because we assume replicas cannot initiate failover on their own

Issuing CLUSTER FAILOVER without any arguments should make the old primary a replica to the new primary. This is currently not happening. Our initial assumption was that CLUSTER FAILOVER will always be called with TAKEOVER option since the primary will not be reachable. We did not account for planned failovers that are still a valid case. In addition, the implementation of attachReplicas is buggy and may throw an exception if any one of the remote nodes is unreachable, resulting in the reachable nodes never being informed of the primary change.

Proposed solution:

  • Refactor attachReplicas so catch statement is inside the loop and does not interfere with communication to remote nodes in the event of an exception.
  • Make old primary replica of the new primary when CLUSTER FAILOVER is issued with default option

NOTE: Avoid changing information directly owned remote nodes by changing the local configuration In the failover case, if we make remote nodes replicas to the new primary by directly updating the local configuration, we will put the cluster into an inconsistent state if those nodes are unreachable. We really, on issuing requests to remote nodes to change their own state, so we can make sure that the change is acknowledged if when the two nodes communicating are well connected. The exception to this rule is when changing ownership of slots as in the failover case. We cannot avoid doing this because slots can be owned by any instance at any given point in time. For the latter scenario, we should be extremely careful to avoid any split-brain inconsistencies. At least until leader election is properly implemented.

vazois avatar Mar 28 '24 18:03 vazois