bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

Add ensemble relocation command which adheres to placement policy

Open equanz opened this issue 4 years ago • 18 comments

Motivation

Currently, we can't recover the ensemble which isn't adhering to placement policy directly. The recover command or autorecovery process is one of the alternative approaches. Unfortunately, however, we can't.

For example, suppose

  • the bookie cluster has two racks( /region/rack1 , /region/rack2 )
  • target ledger has the ensemble ( [bookie1(/region/rack1), bookie2(/region/rack1)] )
  • the replicator uses RackawareEnsemblePlacementPolicy

When bookie1 or bookie2 goes down, then run autorecovery process. If the bookie cluster has remained bookies which place to /region/rack1, the ledger recovers to the /region/rack1 bookie again. https://github.com/apache/bookkeeper/blob/release-4.14.4/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L225-L246 https://github.com/apache/bookkeeper/blob/release-4.14.4/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java#L1055-L1103 https://github.com/apache/bookkeeper/blob/release-4.14.4/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L474-L481

Therefore, we can't recover the ensemble which isn't adhering to placement policy directly. The recover command is similar to this behavior.

I want to add a relocation feature to adhere placement policy.

Changes

Add relocation feature to shell command. So, we can use it synchronously instead of executing asynchronously like the autorecovery process.

In this specification, optimize the new ensemble arrangement to minify the number of bookies replaced. Therefore, we should implement the new method that returns the new ensemble by policies. First, I introduce this feature to RackawareEnsemblePlacementPolicy in this PR.

equanz avatar Dec 07 '21 05:12 equanz

I've created the issue about the failing OWASP Dependency Check. https://github.com/apache/bookkeeper/issues/3134

equanz avatar Mar 23 '22 09:03 equanz

@eolivelli Addressed your first comments. PTAL. (Also rebase to current master to follow https://github.com/apache/bookkeeper/pull/3140 and https://github.com/apache/bookkeeper/pull/3149 .)

equanz avatar Mar 30 '22 11:03 equanz

rerun failure checks

equanz avatar May 24 '22 06:05 equanz

@eolivelli Addressed your comments. PTAL.

equanz avatar May 24 '22 08:05 equanz

@eolivelli ping

equanz avatar Jul 13 '22 04:07 equanz

I have rather generic comments.

If the bookie cluster has remained bookies which place to /region/rack1, the ledger recovers to the /region/rack1 bookie again.

As I understand, there is enforceMinNumRacksPerWriteQuorum option to prevent that (and enforceMinNumFaultDomainsForWrite, enforceStrictZoneawarePlacement for other placement policies)

As I understand, Auditor's placementPolicyCheck just detects the problem. Maybe it makes sense to make Auditor (optionally) put the ledgers with bad placement for re-replication and make AutoRecovery handle that? In this case the CLI command is already existing triggeraudit.

Another note is that the test only covers rack-aware policy. What happens in case of the region-aware or zone-aware policies?

dlg99 avatar Jul 20 '22 17:07 dlg99

Also: https://github.com/apache/bookkeeper/pull/3359

dlg99 avatar Jul 20 '22 18:07 dlg99

Also: https://github.com/apache/bookkeeper/pull/3359

Thank you for sharing. I'll check it first.

equanz avatar Jul 21 '22 08:07 equanz

@dlg99 https://github.com/apache/bookkeeper/pull/2931#issuecomment-1190559164

Sorry for the late reply. I've checked https://github.com/apache/bookkeeper/pull/3359#pullrequestreview-1061219674 . The approach (using autorecovery) sounds better. If it was merged and still has the above issue, I'll rebase to follow https://github.com/apache/bookkeeper/pull/3359 interfaces.

As I understand, there is enforceMinNumRacksPerWriteQuorum option to prevent that (and enforceMinNumFaultDomainsForWrite, enforceStrictZoneawarePlacement for other placement policies)

In my understanding, if we use enforceMinNumRacksPerWriteQuorum , we can't recover quorum temporally for redundancy when a rack goes down. I want to avoid it.

As I understand, Auditor's placementPolicyCheck just detects the problem. Maybe it makes sense to make Auditor (optionally) put the ledgers with bad placement for re-replication and make AutoRecovery handle that? In this case the CLI command is already existing triggeraudit.

I think so too. I thought the next step is asynchronous relocation, like the auto-recovery feature (mentioned in here). However, it was implemented in https://github.com/apache/bookkeeper/pull/3359 .

Another note is that the test only covers rack-aware policy. What happens in the case of the region-aware or zone-aware policies?

When using non-Rackaware, it throws an Exception. I'll add the test. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/EnsemblePlacementPolicy.java#L460-L467

equanz avatar Aug 05 '22 02:08 equanz

I want to use the EnsemblePlacementPolicy#replaceToAdherePlacementPolicy in #3359, it's more graceful, and I will enhance it in some cases.

horizonzy avatar Aug 05 '22 03:08 horizonzy

@horizonzy

I notice that the implements didn’t handle the bookie which already shutdown. The shutdown bookie will be as the defaultRack, can we replace it?

Currently, maybe, we can't do it. This feature expects all of the bookies were already recovered to a normal state. However, your point is correct.

I want to use the EnsemblePlacementPolicy#replaceToAdherePlacementPolicy in https://github.com/apache/bookkeeper/pull/3359, it's more graceful, and I will enhance it in some cases.

It's okay. Just confirm, do you mean "bring this feature to https://github.com/apache/bookkeeper/pull/3359 and improve it myself"? Or improve it in this PR?

equanz avatar Aug 05 '22 05:08 equanz

Improve it both #3359 and this pr. I think the improvement is common.

horizonzy avatar Aug 05 '22 05:08 horizonzy

I notice that the implements didn’t handle the bookie which already shutdown. The shutdown bookie will be as the defaultRack, can we replace it?

Currently, maybe, we can't do it. This feature expects all of the bookies were already recovered to a normal state. However, your point is correct.

I've checked the current code and found handing. If the enforceMinNumRacksPerWriteQuorum is true and the default rack name is /default-region/default-rack then handle in below. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1089-L1090

If the default rack name is /default-rack, then we can't add it. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L138 https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/NetworkTopologyImpl.java#L416-L420

However, it is not expected of me. I'll modify to add to excludeNodes regardless of enforceMinNumRacksPerWriteQuorum value.

Moreover, EnsemblePlacementPolicy#isEnsembleAdheringToPlacementPolicy has a similar process like below. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1043

equanz avatar Aug 05 '22 09:08 equanz

I notice that the implements didn’t handle the bookie which already shutdown. The shutdown bookie will be as the defaultRack, can we replace it?

If the default rack name is /default-rack, then we can't add it. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/NetworkTopologyImpl.java#L416-L420

I means the defaultRack is not literal, I means the defaultRack is /default-region/default-rack. 😂

horizonzy avatar Aug 05 '22 10:08 horizonzy

However, it is not expected of me. I'll modify to add to excludeNodes regardless of enforceMinNumRacksPerWriteQuorum value.

Yes, If we already knows which bookie shutdown, I can add it to excludeNodes to ignore it.

horizonzy avatar Aug 05 '22 10:08 horizonzy

In my understanding, currently, defaultRack of Rackaware is /default-rack. So we should consider about it if use it. https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L138 https://github.com/equanz/bookkeeper/blob/62970b3122fa1f8266763f0d3f6115a066fcabe9/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L234-L248

However, it is not expected of me. I'll modify to add to excludeNodes regardless of enforceMinNumRacksPerWriteQuorum value.

Yes, If we already knows which bookie shutdown, I can add it to excludeNodes to ignore it.

Could you tell me more about shutdown? I think the resolver doesn't return /default-region/default-rack just call TopologyAwareEnsemblePlacementPolicy#handleBookiesThatJoined.

equanz avatar Aug 05 '22 11:08 equanz

In my understanding, currently, defaultRack of Rackaware is /default-rack. So we should consider about it if use it.

Yes, you are right. If the test case, we changed the default rack to /defualt-region/default-rack.

repp.withDefaultRack(NetworkTopology.DEFAULT_REGION_AND_RACK);

If we didn't change it, the default rack is /default-rack.

horizonzy avatar Aug 06 '22 10:08 horizonzy

fix old workflow,please see #3455 for detail

StevenLuMT avatar Aug 24 '22 08:08 StevenLuMT

Part of the code are ported to https://github.com/apache/bookkeeper/pull/3359 and it was merged. So, I'll close this PR.

equanz avatar Apr 05 '23 01:04 equanz