bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

Feature: auto recover support repaired not adhering placement ledger

Open horizonzy opened this issue 2 years ago • 17 comments

Descriptions of the changes in this PR: There is a user case.

  1. They have two zones, they have a rack aware policy that ensures it writes across two zones
  2. They had some data on a topic with long retention
  3. They ran a disaster recovery test, during this test, they shut down one zone
  4. During the period of the DR test, auto-recovery ran. Because the DR test only has one zone active, and because the default of auto-recovery is to do rack aware with the best effort, it recovered up to an expected number of replicas
  5. They stopped the DR test and all was well, but now that ledger was only on one zone
  6. They ran another DR test, this time basically moving data to the another zone, but now data is missing because it is all only on one zone

We should support a feature to cover this case.

horizonzy avatar Jun 25 '22 04:06 horizonzy

For this case, we already support detect these ledger which ensemble is not adhering placement policy at now. In Auditor, if user config auditorPeriodicPlacementPolicyCheckInterval, it will start a scheduled task to trigger placementPolicyCheck, In placementPolicyCheck, it will record the count of ledger fragment which not adhering placement policy. https://github.com/apache/bookkeeper/blob/677ccec3eb84f5be1b3556537871e14eb5e8359c/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java#L1378

But it only record it to stat, not recover data to make ensemble to adhere placement policy.

So we can add a config repairedPlacementPolicyNotAdheringBookieEnabled to control is to repaired the data to adhere placement policy.

In Auditor It will mark ledgerId to unnder replication managed if the ensemble is not adhering placement policy.

In ReplicationWorker It will move data from old bookie to new bookie which network location is different to adhere placement policy. If there is not bookie with different network location, do nothing.

Attention In ReplicationWoker, it just poll under replicated ledger then process it. So when get an under replicated ledger, we should check two case. 1) Is the ledger fragments loss data. 2) Is the ledger fragments is not adhering placement policy. The one fragment maybe meet both case at the same time. If so, we will ignore case 2, just repaired the data loss. If the repaired result is not adhering the placement policy, the auditor will mark it again.

horizonzy avatar Jun 27 '22 08:06 horizonzy

How to use it?

If we want to repaired the ledger which ensemble is not adhering placement policy, we should config two param.

auditorPeriodicPlacementPolicyCheckInterval=3600
repairedPlacementPolicyNotAdheringBookieEnabled=true

In Auditor auditorPeriodicPlacementPolicyCheckInterval control the placement policy check detect interval, repairedPlacementPolicyNotAdheringBookieEnabled control is mark ledger Id to under replication managed when found a ledger ensemble not adhere placement policy.

In ReplicationWorker repairedPlacementPolicyNotAdheringBookieEnabled control is to repaired the ledger which ensemble not adhere placement policy.

Attention

  1. we need ensure the config repairedPlacementPolicyNotAdheringBookieEnabled=true in Auditor and ReplicationWorker at the same time.

  2. we also need the placement policy is same between Auditor and ReplicationWorker, cause both all need use placement policy to help to process.

horizonzy avatar Jun 27 '22 08:06 horizonzy

ping @merlimat @eolivelli @dlg99 @zymap @reddycharan Please help take a look at this PR, thanks.

hangc0276 avatar Jul 01 '22 03:07 hangc0276

also, can we cherry-pick this change to previous release 4.14?

rdhabalia avatar Jul 23 '22 00:07 rdhabalia

it's also better if you can add in description, what modification we are making in the PR.

rdhabalia avatar Jul 23 '22 00:07 rdhabalia

better if you can add in description, what modification we a

@rdhabalia It is a new feature, but controlled by a flag. I'm not sure if it can be cherry-picked to branch-4.14 and branch-4.15, do you have any ideas? @eolivelli @dlg99

hangc0276 avatar Jul 26 '22 03:07 hangc0276

@horizonzy, Would you please take a look at the failed CI? https://github.com/apache/bookkeeper/runs/7580208123?check_suite_focus=true

hangc0276 avatar Jul 31 '22 04:07 hangc0276

@rdhabalia Would you please help review this PR again? thanks. We hope this feature can be included in 4.16.0

hangc0276 avatar Jul 31 '22 09:07 hangc0276

@eolivelli Would you please help review this PR again? thanks. We hope this feature can be included in 4.16.0

hangc0276 avatar Aug 03 '22 02:08 hangc0276

rerun failure checks

StevenLuMT avatar Aug 04 '22 01:08 StevenLuMT

please rebase master's newest code @horizonzy

rerun failure checks

StevenLuMT avatar Aug 04 '22 04:08 StevenLuMT

@equanz Thanks for your graceful way to find replaceAderePlamencePolicy bookie, I have enhanced it and use it for AutoRecovery feature, could you help to view it.

horizonzy avatar Aug 07 '22 04:08 horizonzy

@eolivelli ping, address the comment, could you review it again

horizonzy avatar Aug 08 '22 00:08 horizonzy

And what about the ~default rack~ shutdown bookies? https://github.com/apache/bookkeeper/pull/2931#pullrequestreview-1062016018

If it is being addressed, please let me know where it is.

equanz avatar Aug 08 '22 03:08 equanz

And what about the default rack? #2931 (review)

If it is being addressed, please let me know where it is.

I means if we didn't handle the shutdown bookies, it will be handle as default-bookie, the default-rack is different with other bookie's rack, so it won't be replaced. Now we add the shutdown bookies to excludes nodes to replace it.

horizonzy avatar Aug 08 '22 05:08 horizonzy

Now we add the shutdown bookies to excludes nodes to replace it.

Just confirm, are you mentioned here? https://github.com/horizonzy/bookkeeper/blob/6ef1e2aa8ac0780bd8199b360e840506cbc85e5d/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1101-L1105

equanz avatar Aug 08 '22 06:08 equanz

Now we add the shutdown bookies to excludes nodes to replace it.

Just confirm, are you mentioned here? https://github.com/horizonzy/bookkeeper/blob/6ef1e2aa8ac0780bd8199b360e840506cbc85e5d/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1101-L1105

yes

horizonzy avatar Aug 08 '22 06:08 horizonzy

fix old workflow,please see #3455 for detail

StevenLuMT avatar Aug 24 '22 08:08 StevenLuMT

rerun failure checks

horizonzy avatar Sep 04 '22 07:09 horizonzy

rerun failure checks

hangc0276 avatar Sep 05 '22 02:09 hangc0276

This PR is an enhancement for auto recovery, and the new interface has a default implementation, which is compatible with the old version. I suggest cherry-picking it to branch-4.14 and branch-4.15. Do you have any suggestions? @merlimat @eolivelli @dlg99 @rdhabalia @zymap

hangc0276 avatar Sep 13 '22 03:09 hangc0276

@hangc0276 please ask on dev@ I agree with you. Also 4.15 and 4.16 have some problems that are blocking the adoption.

eolivelli avatar Sep 19 '22 07:09 eolivelli

yes, we should cherry-pick it.

rdhabalia avatar Nov 01 '22 18:11 rdhabalia