ZOOKEEPER-4925: Fix data loss due to propagation of discontinuous committedLog
There are two variants of ZooKeeperServer::processTxn. Those two variants diverge significantly since ZOOKEEPER-3484. processTxn(Request request) pops outstanding change from outstandingChanges and adds txn to committedLog for follower to sync in addition to what processTxn(TxnHeader hdr, Record txn) does. The Learner uses processTxn(TxnHeader hdr, Record txn) to commit txn to memory after ZOOKEEPER-4394, which means it leaves committedLog untouched in SYNCHRONIZATION phase.
This way, a stale follower will have hole in its committedLog after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss.
The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected.
This commit drops processTxn(TxnHeader hdr, Record txn) as processTxn(Request request) is capable in SYNCHRONIZATION phase too.
Also, this commit rejects discontinuous proposals in syncWithLeader and committedLog, so to avoid possible data loss.
Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484
Reviewers: li4wang Author: kezhuw Closes #2254 from kezhuw/ZOOKEEPER-4925-fix-data-loss
(cherry picked from commit e5dd60bf0512ccc1e090d99410a8da48623219da)
This is the 3.9.4 backport of #2254. I have backported it to branch-3.9. cc @tisonkun
You might want to rebase the patch to re-trigger the failing tests.