pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][broker]improve the getAntiAffinityNamespaceOwnedBrokers check exclude the current namespace cause count add.

Open Nicklee007 opened this issue 2 years ago • 9 comments

Motivation

As the PIP 7: Pulsar Failure domain and Anti affinity namespaces design, the anti affinity namespaces should be distributed to evenly across all domain and all the brokers. But in method getAntiAffinityNamespaceOwnedBrokers , brokerToAntiAffinityNamespaceCount add a count even the namespace equal the given bundle's namespace which will be load. The behavior will cause the namespace easy to distributed to a broker which has another namespace in anti affinity group, the behavior broke the anti affinity balance. It's better behavior is the same namespace should be distributed to those broker which has loaded the same namespace when all broker have load at least one namespace in ti affinity group.

there is some case ns-0 ns-1 ns-2 are all set the same anti affinity group like 'a-group

broker-0 own ns-0 bundle-0;
broker-1 own ns-1 bundle-0; broker-2 own ns-2 bundle-0; then another ns-2 bundle-1 need choice a broker to load. As the old policy, broker-0 broker-1 and broker-2 are all satisfy the least NamespaceCount; but if ns-2 bundle-1 load to broker-0 or broker-1 will broke the anti affinity balance. ns-2 bundle-1 need be load by broker-2 is better.

Also, the behavior will cause give up doLoadShedding when the all broker own one namespace in anti affinity group, but the different broker owned namespace bundle count and payload is different. So brokerToAntiAffinityNamespaceCount should exclude the given namespace count add.

Modifications

  1. In LoadManagerShared.class getAntiAffinityNamespaceOwnedBrokers method, exclude the given namespace count add.
  2. changed shouldAntiAffinityNamespaceUnload check;
  3. add some unit test.

Documentation

  • [X] doc-not-needed

Matching PR in forked repository

PR in forked repository: https://github.com/Nicklee007/pulsar/pull/5

Nicklee007 avatar Jul 13 '22 06:07 Nicklee007

@codelipenghui @Jason918 @eolivelli Could you help to review this PR, Thx.

Nicklee007 avatar Jul 26 '22 04:07 Nicklee007

there is some case broker-0 own ns-0 bundle-0; broker-1 own ns-1 bundle-0; broker-2 own ns-2 bundle-0; then another ns-2 bundle-1 need choice a broker to load. As the old policy, broker-0 broker-1 and broker-2 are all satisfy the least NamespaceCount; but if ns-2 bundle-1 load to broker-0 or broker-1 will broke the anti affinity balance. ns-2 bundle-1 need be load by broker-2 is better.

What's the anti affinity setting in this case?

Jason918 avatar Jul 28 '22 04:07 Jason918

What's the anti affinity setting in this case?

@Jason918 ns-0 ns-1 ns-2 are all set the same anti affinity group like 'a-group', and the three ns be anti affinity load is expected.

Nicklee007 avatar Jul 28 '22 04:07 Nicklee007

/pulsarbot run-failure-checks

Nicklee007 avatar Aug 01 '22 03:08 Nicklee007

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Oct 22 '22 02:10 github-actions[bot]

/pulsarbot run-failure-checks

Nicklee007 avatar Jan 11 '23 11:01 Nicklee007

Codecov Report

Merging #16563 (118b16b) into master (b05fddb) will decrease coverage by 14.98%. The diff coverage is 0.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #16563       +/-   ##
=============================================
- Coverage     45.64%   30.66%   -14.99%     
+ Complexity    11043     6016     -5027     
=============================================
  Files           773      633      -140     
  Lines         74463    59892    -14571     
  Branches       8018     6241     -1777     
=============================================
- Hits          33986    18363    -15623     
- Misses        36687    38973     +2286     
+ Partials       3790     2556     -1234     
Flag Coverage Δ
unittests 30.66% <0.00%> (-14.99%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sar/broker/loadbalance/impl/LoadManagerShared.java 39.39% <0.00%> (-4.03%) :arrow_down:
...n/java/org/apache/pulsar/client/api/RawReader.java 0.00% <0.00%> (-100.00%) :arrow_down:
...ava/org/apache/pulsar/broker/admin/v1/Brokers.java 0.00% <0.00%> (-100.00%) :arrow_down:
...ava/org/apache/pulsar/broker/admin/v2/Brokers.java 0.00% <0.00%> (-100.00%) :arrow_down:
...va/org/apache/pulsar/broker/admin/v1/Clusters.java 0.00% <0.00%> (-100.00%) :arrow_down:
.../org/apache/pulsar/broker/admin/v1/Properties.java 0.00% <0.00%> (-100.00%) :arrow_down:
.../org/apache/pulsar/client/impl/RawMessageImpl.java 0.00% <0.00%> (-100.00%) :arrow_down:
.../apache/pulsar/broker/admin/v2/ResourceGroups.java 0.00% <0.00%> (-100.00%) :arrow_down:
...ar/common/naming/PartitionedManagedLedgerInfo.java 0.00% <0.00%> (-100.00%) :arrow_down:
...e/pulsar/broker/admin/impl/ResourceQuotasBase.java 0.00% <0.00%> (-96.43%) :arrow_down:
... and 308 more

codecov-commenter avatar Jan 11 '23 11:01 codecov-commenter

/pulsarbot run-failure-checks

Nicklee007 avatar Jan 11 '23 13:01 Nicklee007

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Feb 11 '23 02:02 github-actions[bot]