kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Proposal for Handling Orphaned Slaves in Cluster Mode

Open RiversJin opened this issue 9 months ago • 3 comments

Search before asking

  • [ ] I had searched in the issues and found no similar issues.

Motivation

Currently, there is a minor issue in Kvrocks cluster mode: When a slave node is deleted via the controller, the controller only notifies this change to nodes within the cluster. As a result, the deleted slave remains unaware of its removal and continues replication workflows, which is counterintuitive. This proposal aims to resolve this issue.

Key Considerations:

  1. Unreliable Controller-to-Node Notification:
    Relying on the controller to synchronize deletion events to the target node is unreliable. When a node is deleted, it often indicates an unhealthy state (e.g., network instability). We cannot guarantee successful delivery of deletion notifications (e.g., messages may arrive but fail to return acknowledgments). Thus, unlimited retries on the controller side are not a robust design.

  2. Protocol Flexibility:
    While Kvrocks' master-slave replication protocol is inspired by Redis, it is not identical. Introducing additional replication steps distinct from Redis is acceptable.

  3. Topology Synchronization Priority:
    The controller does not prioritize updating topology information for the current shard’s master. Consequently, masters must account for potentially stale topology data when validating slave replication requests and avoid outright rejection.

Solution

  1. Slave Metadata Propagation:
    During replconf execution, slaves will send their node ID and current config version to the master. The master will store this metadata.

  2. Version-Based Validation:
    After processing replconf or cluster setnodes commands, the master will scan connections meeting the following criteria and return a specific error to trigger replication termination on the slave side:

    • Condition: The master’s topology version is newer than the slave’s (master_version > slave_version), and the slave’s node ID no longer exists in the master’s topology.
  3. Cleanup Workflow:
    Slaves receiving the specific error will terminate replication and clean up their state.

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

RiversJin avatar Mar 21 '25 03:03 RiversJin

@RiversJin Thanks for your proposal.

During replconf execution, slaves will send their node ID and current config version to the master. The master will store this metadata.

Do you mean the master will keep this metadata inside the memory instead of persisting on disk?

git-hulk avatar Mar 21 '25 08:03 git-hulk

Do you mean the master will keep this metadata inside the memory instead of persisting on disk?

Storing the id and version in memory is sufficient. Specifically, placing this information in the Connection object would be appropriate.

If the master restarts, slaves will reconnect and resubmit these fields during the new handshake, ensuring the metadata(id and version) is reinitialized.

RiversJin avatar Mar 21 '25 08:03 RiversJin

I see. I'm good with this enhancement, to see if other guys have any input? @apache/kvrocks-committers

git-hulk avatar Mar 21 '25 08:03 git-hulk