[fix][broker] Introduce the last sent position to fix message ordering issues in Key_Shared (PIP-282)
PIP: #20776
Motivation
Key_Shared has a mechanism called the "recently joined consumers" to keep message ordering. However, currently, it doesn't care about some corner cases. More specifically, we found two out-of-order issues cased by:
- [issue-1] The race condition in the "recently joined consumers", where consumers can be added before finishing reading and dispatching messages from ledgers.
- [issue-2] Messages could be added to messagesToRedeliver without consumer-side operations such as unacknowledgement.
We should care about these cases in Key_Shared subscription. (For more details, please see the PIP document.)
Modifications
- Introduce the new position (the lastSentPosition and individuallySentPositions) and use them in PersistentStickyKeyDispatcherMultipleConsumers.
- Rename the consumer stats
readPositionWhenJoiningtolastSentPositionWhenJoining. - Modify the subscription stats (consumersAfterMarkDeletePosition) of the definition.
(For more details, please see the PIP document.)
Verifying this change
- [x] Make sure that the change passes the CI checks.
This change added tests and can be verified as follows:
- Unit tests for the lastSentPosition and individuallySentPositions
- Test for guaranteed message ordering in corner cases
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [x] The metrics
- https://github.com/apache/pulsar/blob/43a2dc3f9fd9ab68e7429f2345899a6b0cb0bcbe/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java#L1281-L1288
- https://github.com/apache/pulsar/blob/43a2dc3f9fd9ab68e7429f2345899a6b0cb0bcbe/pulsar-client-admin-api/src/main/java/org/apache/pulsar/common/policies/data/ConsumerStats.java#L72-L73
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [x]
doc-required - [ ]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/equanz/pulsar/pull/6
LGTM
/pulsarbot rerun-failure-checks
Codecov Report
Attention: Patch coverage is 87.93103% with 14 lines in your changes are missing coverage. Please review.
Project coverage is 73.90%. Comparing base (
bbc6224) to head (c3b318a). Report is 179 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #21953 +/- ##
============================================
+ Coverage 73.57% 73.90% +0.33%
- Complexity 32624 33094 +470
============================================
Files 1877 1885 +8
Lines 139502 140522 +1020
Branches 15299 15466 +167
============================================
+ Hits 102638 103857 +1219
+ Misses 28908 28616 -292
- Partials 7956 8049 +93
| Flag | Coverage Δ | |
|---|---|---|
| inttests | 26.69% <44.82%> (+2.11%) |
:arrow_up: |
| systests | 24.45% <45.68%> (+0.13%) |
:arrow_up: |
| unittests | 73.18% <87.93%> (+0.34%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files | Coverage Δ | |
|---|---|---|
| ...che/bookkeeper/mledger/impl/ManagedCursorImpl.java | 79.83% <100.00%> (+0.53%) |
:arrow_up: |
| ...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java | 81.01% <ø> (+0.35%) |
:arrow_up: |
| ...ava/org/apache/pulsar/broker/service/Consumer.java | 86.66% <100.00%> (ø) |
|
| .../common/policies/data/stats/ConsumerStatsImpl.java | 97.61% <100.00%> (ø) |
|
| ...mon/policies/data/stats/SubscriptionStatsImpl.java | 94.21% <100.00%> (+0.09%) |
:arrow_up: |
| ...ker/service/persistent/PersistentSubscription.java | 77.18% <92.85%> (+0.49%) |
:arrow_up: |
| ...ersistentStickyKeyDispatcherMultipleConsumers.java | 87.02% <85.55%> (+1.38%) |
:arrow_up: |
@equanz Sorry for the late review, I will finish the review tomorrow.
Hi @poorbarcode. Thank you for your review. What is the current status?
Hi @poorbarcode. Thank you for your review. What is the current status?
Sorry, I am busy for other things before, I will done this review soon
Rebased to fix conflicts. BTW, can someone review this? I've been trying to fix the issue for a year now... (cf. https://github.com/apache/pulsar/pull/20179)
@codelipenghui Could you take a look?
@codelipenghui @poorbarcode
Looking forward to your reviews.
@equanz @poorbarcode Sorry for the late response, I will review this week.
Hi @codelipenghui . Do you have any comments?
Hi @codelipenghui . Do you have any comments?
Sorry, I'm struggling in other tasks. Please go ahead to merge the PR. @poorbarcode
@equanz Could you please handle the conflicts?
/pulsarbot rerun-failure-checks
@poorbarcode Addressed. PTAL
@poorbarcode Could you take a look?