pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] Handle BucketDelayedDeliveryTracker recover failed

Open dao-jun opened this issue 1 year ago • 4 comments

Motivation

We initialize DelayedDeliveryTracker when dispatch messages by calling DelayedDeliveryTrackerFactory.newTracker in AbstractBaseDispatcher.

However, when we set delayedDeliveryTrackerFactoryClassName to org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory, BucketDelayedDeliveryTracker has a chance to recover failed(see here ), it may caused by Bookkeeper exception, timeout exception or sth else, and we don't handle the case.

If the exception happens, it may lead to memory leaks(Entries, OpReadEntry are unable to release) and some other issues, if BucketDelayedDeliveryTracker always unable to recover, the situation will worsen.

The PR introduces fallback mechanism, if initialize BucketDelayedDeliveryTracker failed, fallback to InMemoryDelayedDeliveryTracker to handle this case.

Modifications

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository:

dao-jun avatar May 17 '24 07:05 dao-jun

@dao-jun Can you add tests to cover this case?

coderzc avatar May 17 '24 07:05 coderzc

@dao-jun Can you add tests to cover this case?

Yes, but before add test, I want to get more feedbacks, to make sure this change is reasonable

dao-jun avatar May 17 '24 07:05 dao-jun

@coderzc could you please also help review https://github.com/apache/pulsar/pull/22707 when you are available?

dao-jun avatar May 17 '24 07:05 dao-jun

Codecov Report

Attention: Patch coverage is 67.27273% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 73.19%. Comparing base (bbc6224) to head (ed9e5a2). Report is 287 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22735      +/-   ##
============================================
- Coverage     73.57%   73.19%   -0.39%     
+ Complexity    32624    32591      -33     
============================================
  Files          1877     1891      +14     
  Lines        139502   141466    +1964     
  Branches      15299    15519     +220     
============================================
+ Hits         102638   103543     +905     
- Misses        28908    29924    +1016     
- Partials       7956     7999      +43     
Flag Coverage Δ
inttests 27.40% <21.81%> (+2.81%) :arrow_up:
systests 24.60% <1.81%> (+0.28%) :arrow_up:
unittests 72.20% <67.27%> (-0.65%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...r/delayed/BucketDelayedDeliveryTrackerFactory.java 93.47% <100.00%> (+2.04%) :arrow_up:
...delayed/InMemoryDelayedDeliveryTrackerFactory.java 95.23% <100.00%> (+3.57%) :arrow_up:
...bucket/RecoverDelayedDeliveryTrackerException.java 100.00% <100.00%> (ø)
...sistent/PersistentDispatcherMultipleConsumers.java 73.80% <100.00%> (-0.53%) :arrow_down:
...r/delayed/bucket/BucketDelayedDeliveryTracker.java 83.12% <33.33%> (-0.58%) :arrow_down:
...rg/apache/pulsar/broker/service/BrokerService.java 81.81% <50.00%> (+1.03%) :arrow_up:
.../pulsar/broker/delayed/DelayedDeliveryTracker.java 20.00% <20.00%> (ø)

... and 349 files with indirect coverage changes

codecov-commenter avatar May 19 '24 06:05 codecov-commenter