raft-rs icon indicating copy to clipboard operation
raft-rs copied to clipboard

RFC: Follower Snapshot

Open BusyJay opened this issue 5 years ago • 7 comments

For now snapshot is always sent from leader to follower, which is not always sufficient. For example, consider there are 5 nodes in two data center, (1, 2) and (3, 4, 5). If 5 is leader and 1 needs a snapshot, then data have to be transferred across data center.

But in fact any nodes in cluster can send a snapshot once requested logs are applied. So it's possible that 1 requests a snapshot aggressively from 2, so that data can be transferred internally.

Support requesting snapshot aggressively is also useful for recovery from snapshot files corruption.

BusyJay avatar Oct 30 '18 10:10 BusyJay

We can name this feature Follower snapshot :-)

Maybe we can even support Follower replication 🤔

siddontang avatar Oct 31 '18 09:10 siddontang

We've previously talked about how this feature would be a desirable feature for our use case in TiKV, and I think it can be useful for others as well.

I'd definitely love to do this, but if anyone else wants to tackle it please feel encouraged to let us know and start it!

It sounds like the generalized, simplest definition of the feature is:

The ability to request snapshots from any node to any other node, so long as the snapshot only contains committed data.

I think doing this would open the door for a second, future feature to support follower replication. However this feature will require some more consideration. Let's plan for that after this one? I can open another issue for it, so we can separate the discussion

Hoverbear avatar Oct 31 '18 10:10 Hoverbear

I'd like to work on this and I came up with a few questions which may need to be discussed probably:

  • If a node tries to get snapshot from an another node, should the node ignore msgs from leader (even heartbeat) to avoid log replication ?
  • Do we need a timeout mechanism for this kind of snapshot request if the target node can not deliver message to the requester ?

Fullstop000 avatar Jan 11 '19 06:01 Fullstop000

Thanks @Fullstop000

  1. In the current Raft implementation, if the leader is sending the snapshot to the follower, it can still send heartbeat messages to it to keep alive, so I think we still need to let leader do this.
  2. Now there is no timeout mechanism for snapshot sending, we will handle this outside the Raft library.

siddontang avatar Jan 11 '19 07:01 siddontang

@Fullstop000 You can send your Wechat account to my email [email protected] if you want to a real-time discussion.

siddontang avatar Jan 11 '19 07:01 siddontang

@siddontang Thanks for the quick reply.

Considering about the point of efficiency @BusyJay mentioned here, in the situation of 2 IDCs, once leader has applied the AddNode , it'll start communication with the new node which can be less efficient to send snapshot between IDC than internal transporting and will increase working load to the leader.

We can save 1.5 roundtrip and crossing IDC data sending from leader if node ignores the msgs but it may introduce some extra complexity into raft algorithm.

I prefer to keep raft layer clean and let third party ( such as pd ? ) to do the control generally.

Fullstop000 avatar Jan 11 '19 07:01 Fullstop000

I agree with siddontang, it is better to keep raft layer clean. raft is complicated and is critical to protect data consistency across nodes. one new mechanism introduced should be ensured not to break raft protocol.

betwins avatar Mar 05 '22 10:03 betwins