bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

e3/w3/a2 invalid on region aware placement if min regions for durability 2

Open benjumanji opened this issue 10 months ago • 0 comments

I have the following config (shortened for brevity) on pulsar 4.0.1

bookkeeperClientRegionawarePolicyEnabled=true
reppRegionsToWrite=euw1-az3;euw1-az1;euw1-az2
reppMinimumRegionsForDurability=2

I have at least three bookies. If I try the aforementioned policy (e3,w3,a2) then the exception here: https://github.com/apache/bookkeeper/blob/0748423e3228f7cf61d2e1f2ab11e354ed84c0df/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RegionAwareEnsemblePlacementPolicy.java#L317 is thrown.

Screenshot 2025-01-30 at 21 01 17

This makes little sense to me as 2 <= 3 - 3/2 evaluates to true, but I am failing to see why this is a bad configuration.

            // We must survive the failure of numRegions - effectiveMinRegionsForDurability. When these
            // regions have failed we would spread the replicas over the remaining
            // effectiveMinRegionsForDurability regions; we have to make sure that the ack quorum is large
            // enough such that there is a configuration for spreading the replicas across
            // effectiveMinRegionsForDurability - 1 regions

Ok so I have 3 regions, and I want 2 for durability. I therefore can only tolerate 1 region failing. If that region fails I have two regions, and I require two acks. I have two bookies, they can both ack, what's the problem? Why is 4/4/3 good and 3/3/2 bad? If the argument is that the initial placements might be 2 in one region and 1 in another, why doesn't this apply to 4/4/3 (3 in one region and one in another)? If we plug in 3/3/2 to the comment, then we need to survive 3 - 2 failures (1), and we need to make sure acks cover 2 - 1 (1) regions? Why does 3 acks + 4 writers fulfil this and 2 acks and 3 writers not?

I guess what's eating me is I don't want the extra tail latency or to pay for the extra disks. I just want 3 replicas, and to survive a region out. There doesn't seem to be a configuration possible for this. The only value for min regions for durability under which the expression evaluates to false for 3/3/2 is 1, which is a data-loss ready config.

Originally posted by @benjumanji in https://github.com/apache/pulsar/discussions/23913

benjumanji avatar Feb 15 '25 14:02 benjumanji