zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-2789 Reassign `ZXID` for solving 32bit overflow problem

Open zichen-gan opened this issue 1 year ago • 9 comments

The original PR link: https://github.com/apache/zookeeper/pull/262 Since the aforementioned PR does not support rolling hot updates, this PR aims to add rolling upgrade capabilities. The goal of this PR is to change the counter bit length from 32 to 40 and the epoch bit length from 32 to 20 through a rolling upgrade approach.

zichen-gan avatar May 16 '24 03:05 zichen-gan

@kezhuw @eolivelli @li4wang @ztzg

I've just come across this one and started to review for 3.9.3, but need more eyeballs. Could you please review?

anmolnar avatar Sep 19 '24 20:09 anmolnar

This pr will change the server side data format. I think it does not fit a patch release.

If it is 1k/s ops, then as long as $2^{32} / (86400 * 1000) \approx 49.7$ days ZXID will exhausted. https://github.com/apache/zookeeper/pull/262#issue-230567070

Thinking about some abnormal situations, maybe 24 bit for epoch and 40 bit for counter is more better choice: M a t h . m i n ( 2 24 / ( 24 ∗ 365 ) , 2 40 / ( 86400 ∗ 1000 ∗ 365 ) ) ≈ M a t h . m i n ( 1915.2 , 34.9 ) = 34.9 years. https://github.com/apache/zookeeper/pull/262#issuecomment-303276951

So i offered a better solution is 24-bit epoch in second comment. Even if the frequency of leader election is once by every single hours, we will not experience the epoch overflow until 1915.2 years later. https://github.com/apache/zookeeper/pull/262#issuecomment-351886573

Given above, I think it is promising. It promotes rollover rate from 49.7 days to 34.9 years assuming 1k/s ops. The best is that it demands no protocol change at the price of zxid format change.

Before finalizing this path, I may want to taste whether leadership inheritance is feasible.

kezhuw avatar Sep 23 '24 12:09 kezhuw

@zichen-gan You need to close / re-open PR or force push to trigger another CI run.

anmolnar avatar Sep 23 '24 22:09 anmolnar

Given above, I think it is promising. It promotes rollover rate from 49.7 days to 34.9 years assuming 1k/s ops. The best is that it demands no protocol change at the price of zxid format change. Before finalizing this path, I may want to taste whether leadership inheritance is feasible.

Sure, I'll wait for your review. Strange thing is that as you outlined it doesn't require protocol change, but still the patch has to increase protocol version.

anmolnar avatar Sep 23 '24 22:09 anmolnar

as you outlined it doesn't require protocol change, but still the patch has to increase protocol version.

My fault! By "no protocol change", I mean we don't need to prove its correctness in ZAB.

kezhuw avatar Sep 24 '24 03:09 kezhuw

Hi~ @anmolnar @kezhuw I would like to verify this feature in our production environment. After running mvn clean package -DskipTests on version 3.4.14, in which directory of Zookeeper is the complete installation package located?

wg1026688210 avatar Apr 22 '25 02:04 wg1026688210

on version 3.4.14, in which directory of Zookeeper is the complete installation package located

It is zookeeper-assembly/target/apache-zookeeper-3.10.0-SNAPSHOT-bin.tar.gz in master.

I would like to verify this feature in our production environment.

Please backup your data. This implementation is a one way ticket, a.k.a. it changes data storage format and probably has no way to downgrade. I presented an alternative #2208(ZOOKEEPER-4883) before.

kezhuw avatar Apr 22 '25 14:04 kezhuw