raft-rs
raft-rs copied to clipboard
Impossible to request Raft snapshot if `term` of last entry in WAL is older than current `term`
I've been trying to use Raft snapshots to quickly synchronize Raft state when adding new nodes to the cluster, or restarting existing nodes, but I've discovered that there's this check in Raft::request_snapshot
.
Due to this check it's impossible to request Raft snapshot, if term
of last entry in the WAL (of the node that tries to request a snapshot) is older than current term
of the cluster. And so, if there was even a single election since the node was last online, you can't request snapshots until you appended all entries from previous terms an reached an entry from current term.
E.g.:
Imagine we restarted follower node after a long downtime. It currently has this WAL:
...
term: 42, index: 1000
And this is WAL on leader node:
...
term: 42, index: 1000,
...
term: 42, index: 100000,
term: 43, index: 100001, // election happended here
...
term: 43, index: 200000,
This means, that follower node has to synchronize 99k entries (indices 1001 to 100001) through regular MsgAppend
messages, before it can request Raft snapshot to "recover" remaining 100k entries.
So, my question would be: is there an explanation why this limitation required, or this check is overly restrictive/incorrect?
Seems like RawNode::request_snapshot
documentation also hints that the check should have been self.term >= request_index_term
(instead of self.term == request_index_term
).