bookkeeper
bookkeeper copied to clipboard
Feature: auto recover support repaired not adhering placement ledger
Descriptions of the changes in this PR: There is a user case.
- They have two zones, they have a rack aware policy that ensures it writes across two zones
- They had some data on a topic with long retention
- They ran a disaster recovery test, during this test, they shut down one zone
- During the period of the DR test, auto-recovery ran. Because the DR test only has one zone active, and because the default of auto-recovery is to do rack aware with the best effort, it recovered up to an expected number of replicas
- They stopped the DR test and all was well, but now that ledger was only on one zone
- They ran another DR test, this time basically moving data to the another zone, but now data is missing because it is all only on one zone
We should support a feature to cover this case.
For this case, we already support detect these ledger which ensemble is not adhering placement policy at now.
In Auditor, if user config auditorPeriodicPlacementPolicyCheckInterval
, it will start a scheduled task to trigger placementPolicyCheck
, In placementPolicyCheck
, it will record the count of ledger fragment which not adhering placement policy.
https://github.com/apache/bookkeeper/blob/677ccec3eb84f5be1b3556537871e14eb5e8359c/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java#L1378
But it only record it to stat, not recover data to make ensemble to adhere placement policy.
So we can add a config repairedPlacementPolicyNotAdheringBookieEnabled
to control is to repaired the data to adhere placement policy.
In Auditor It will mark ledgerId to unnder replication managed if the ensemble is not adhering placement policy.
In ReplicationWorker It will move data from old bookie to new bookie which network location is different to adhere placement policy. If there is not bookie with different network location, do nothing.
Attention In ReplicationWoker, it just poll under replicated ledger then process it. So when get an under replicated ledger, we should check two case. 1) Is the ledger fragments loss data. 2) Is the ledger fragments is not adhering placement policy. The one fragment maybe meet both case at the same time. If so, we will ignore case 2, just repaired the data loss. If the repaired result is not adhering the placement policy, the auditor will mark it again.
How to use it?
If we want to repaired the ledger which ensemble is not adhering placement policy, we should config two param.
auditorPeriodicPlacementPolicyCheckInterval=3600
repairedPlacementPolicyNotAdheringBookieEnabled=true
In Auditor
auditorPeriodicPlacementPolicyCheckInterval
control the placement policy check detect interval, repairedPlacementPolicyNotAdheringBookieEnabled
control is mark ledger Id to under replication managed when found a ledger ensemble not adhere placement policy.
In ReplicationWorker
repairedPlacementPolicyNotAdheringBookieEnabled
control is to repaired the ledger which ensemble not adhere placement policy.
Attention
-
we need ensure the config
repairedPlacementPolicyNotAdheringBookieEnabled=true
in Auditor and ReplicationWorker at the same time. -
we also need the placement policy is same between Auditor and ReplicationWorker, cause both all need use placement policy to help to process.
ping @merlimat @eolivelli @dlg99 @zymap @reddycharan Please help take a look at this PR, thanks.
also, can we cherry-pick this change to previous release 4.14?
it's also better if you can add in description, what modification we are making in the PR.
better if you can add in description, what modification we a
@rdhabalia It is a new feature, but controlled by a flag. I'm not sure if it can be cherry-picked to branch-4.14 and branch-4.15, do you have any ideas? @eolivelli @dlg99
@horizonzy, Would you please take a look at the failed CI? https://github.com/apache/bookkeeper/runs/7580208123?check_suite_focus=true
@rdhabalia Would you please help review this PR again? thanks. We hope this feature can be included in 4.16.0
@eolivelli Would you please help review this PR again? thanks. We hope this feature can be included in 4.16.0
rerun failure checks
please rebase master's newest code @horizonzy
rerun failure checks
@equanz Thanks for your graceful way to find replaceAderePlamencePolicy bookie, I have enhanced it and use it for AutoRecovery feature, could you help to view it.
@eolivelli ping, address the comment, could you review it again
And what about the ~default rack~ shutdown bookies? https://github.com/apache/bookkeeper/pull/2931#pullrequestreview-1062016018
If it is being addressed, please let me know where it is.
And what about the default rack? #2931 (review)
If it is being addressed, please let me know where it is.
I means if we didn't handle the shutdown bookies, it will be handle as default-bookie, the default-rack is different with other bookie's rack, so it won't be replaced. Now we add the shutdown bookies to excludes nodes to replace it.
Now we add the shutdown bookies to excludes nodes to replace it.
Just confirm, are you mentioned here? https://github.com/horizonzy/bookkeeper/blob/6ef1e2aa8ac0780bd8199b360e840506cbc85e5d/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1101-L1105
Now we add the shutdown bookies to excludes nodes to replace it.
Just confirm, are you mentioned here? https://github.com/horizonzy/bookkeeper/blob/6ef1e2aa8ac0780bd8199b360e840506cbc85e5d/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1101-L1105
yes
fix old workflow,please see #3455 for detail
rerun failure checks
rerun failure checks
This PR is an enhancement for auto recovery, and the new interface has a default implementation, which is compatible with the old version. I suggest cherry-picking it to branch-4.14 and branch-4.15. Do you have any suggestions? @merlimat @eolivelli @dlg99 @rdhabalia @zymap
@hangc0276 please ask on dev@ I agree with you. Also 4.15 and 4.16 have some problems that are blocking the adoption.
yes, we should cherry-pick it.