zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-3023: Sync and commit diff log entries before NEWLEADER ack

Open kezhuw opened this issue 2 years ago • 5 comments

ZOOKEEPER-2678 could skip snapshot in diff sync, but diff txns are logged and committed after NEWLEADER ack. ZOOKEEPER-3911 moves txn logging before NEWLEADER ack, but the txn logging is asynchronous. So it is indeterminate whether diff txns have been persisted to disk or not after NEWLEADER ack.

This commit try to sync and commit txn logs synchronously before ack to NEWLEADER thus provides strong guarantee that follower is in sync with leader after NEWLEADER ack received.

This behavior is consistent with pre ZOOKEEPER-2678 and easy to test.

kezhuw avatar Apr 02 '22 12:04 kezhuw

@eolivelli @hanm @nkalmar @afine @fpj Could you please take a look at this pr ? I think it is related to #157 and #1445.

kezhuw avatar Apr 05 '22 08:04 kezhuw

I have updated this pr with a fixup commit to remove #1445's effect, but I still leave room for ZOOKEEPER-4643. See https://github.com/apache/zookeeper/pull/1925#issuecomment-1535372739.

kezhuw avatar May 04 '23 20:05 kezhuw

Hi .

I use the TLA+ specifications to verify the correctness of the fix in this pr. It can avoid the issues of ZOOKEEPER-3023, ZOOKEEPER-4394, ZOOKEEPER-4646 and ZOOKEEPER-4685.

Besides, I really like this fix. It is really simple and easy to understand.

AlphaCanisMajoris avatar Mar 18 '24 02:03 AlphaCanisMajoris

This should be superceded by #2111. Both try to log diff entries synchronous before NEWLEADER ack and #2028 is similar. Additionally, #2111 should fix ZOOKEEPER-4643 also.

I am not sure testNormalFollowerRunWithDiff is still flaky or not. I will check it later. No sure it is worth to revert it to pre ZOOKEEPER-2678 given my comments in ZOOKEEPER-3023.

kezhuw avatar Mar 18 '24 10:03 kezhuw

This should be superceded by #2111. Both try to log diff entries synchronous before NEWLEADER ack and #2028 is similar. Additionally, #2111 should fix ZOOKEEPER-4643 also.

That's cool. The code fix of #2111 is basically the same as that of #2028. It should fix ZOOKEEPER-4643, ZOOKEEPER-4646 and ZOOKEEPER-4685.

I am not sure testNormalFollowerRunWithDiff is still flaky or not. I will check it later. No sure it is worth to revert it to pre ZOOKEEPER-2678 given my comments in ZOOKEEPER-3023.

In my understanding, #2111 cannot avoid the issue ZOOKEEPER-4394, right? Not all the pending requests need to be logged before the follower replies ACK of NEWLEADER.

AlphaCanisMajoris avatar Mar 19 '24 16:03 AlphaCanisMajoris