pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] Introduce the last sent position to fix message ordering issues in Key_Shared (PIP-282)

Open equanz opened this issue 1 year ago • 15 comments

PIP: #20776

Motivation

Key_Shared has a mechanism called the "recently joined consumers" to keep message ordering. However, currently, it doesn't care about some corner cases. More specifically, we found two out-of-order issues cased by:

  1. [issue-1] The race condition in the "recently joined consumers", where consumers can be added before finishing reading and dispatching messages from ledgers.
  2. [issue-2] Messages could be added to messagesToRedeliver without consumer-side operations such as unacknowledgement.

We should care about these cases in Key_Shared subscription. (For more details, please see the PIP document.)

Modifications

  1. Introduce the new position (the lastSentPosition and individuallySentPositions) and use them in PersistentStickyKeyDispatcherMultipleConsumers.
  2. Rename the consumer stats readPositionWhenJoining to lastSentPositionWhenJoining.
  3. Modify the subscription stats (consumersAfterMarkDeletePosition) of the definition.

(For more details, please see the PIP document.)

Verifying this change

  • [x] Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Unit tests for the lastSentPosition and individuallySentPositions
  • Test for guaranteed message ordering in corner cases

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [x] The metrics
    • https://github.com/apache/pulsar/blob/43a2dc3f9fd9ab68e7429f2345899a6b0cb0bcbe/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java#L1281-L1288
    • https://github.com/apache/pulsar/blob/43a2dc3f9fd9ab68e7429f2345899a6b0cb0bcbe/pulsar-client-admin-api/src/main/java/org/apache/pulsar/common/policies/data/ConsumerStats.java#L72-L73
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [x] doc-required
  • [ ] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/equanz/pulsar/pull/6

equanz avatar Jan 23 '24 07:01 equanz

LGTM

hanmz avatar Feb 18 '24 02:02 hanmz

/pulsarbot rerun-failure-checks

poorbarcode avatar Feb 19 '24 04:02 poorbarcode

Codecov Report

Attention: Patch coverage is 87.93103% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 73.90%. Comparing base (bbc6224) to head (c3b318a). Report is 179 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21953      +/-   ##
============================================
+ Coverage     73.57%   73.90%   +0.33%     
- Complexity    32624    33094     +470     
============================================
  Files          1877     1885       +8     
  Lines        139502   140522    +1020     
  Branches      15299    15466     +167     
============================================
+ Hits         102638   103857    +1219     
+ Misses        28908    28616     -292     
- Partials       7956     8049      +93     
Flag Coverage Δ
inttests 26.69% <44.82%> (+2.11%) :arrow_up:
systests 24.45% <45.68%> (+0.13%) :arrow_up:
unittests 73.18% <87.93%> (+0.34%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 79.83% <100.00%> (+0.53%) :arrow_up:
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.01% <ø> (+0.35%) :arrow_up:
...ava/org/apache/pulsar/broker/service/Consumer.java 86.66% <100.00%> (ø)
.../common/policies/data/stats/ConsumerStatsImpl.java 97.61% <100.00%> (ø)
...mon/policies/data/stats/SubscriptionStatsImpl.java 94.21% <100.00%> (+0.09%) :arrow_up:
...ker/service/persistent/PersistentSubscription.java 77.18% <92.85%> (+0.49%) :arrow_up:
...ersistentStickyKeyDispatcherMultipleConsumers.java 87.02% <85.55%> (+1.38%) :arrow_up:

... and 243 files with indirect coverage changes

codecov-commenter avatar Feb 19 '24 05:02 codecov-commenter

@equanz Sorry for the late review, I will finish the review tomorrow.

poorbarcode avatar Feb 25 '24 16:02 poorbarcode

Hi @poorbarcode. Thank you for your review. What is the current status?

equanz avatar Mar 07 '24 03:03 equanz

Hi @poorbarcode. Thank you for your review. What is the current status?

Sorry, I am busy for other things before, I will done this review soon

poorbarcode avatar Mar 07 '24 03:03 poorbarcode

Rebased to fix conflicts. BTW, can someone review this? I've been trying to fix the issue for a year now... (cf. https://github.com/apache/pulsar/pull/20179)

equanz avatar Apr 23 '24 07:04 equanz

@codelipenghui Could you take a look?

poorbarcode avatar May 08 '24 03:05 poorbarcode

@codelipenghui @poorbarcode

Looking forward to your reviews.

equanz avatar May 21 '24 03:05 equanz

@equanz @poorbarcode Sorry for the late response, I will review this week.

codelipenghui avatar May 27 '24 01:05 codelipenghui

Hi @codelipenghui . Do you have any comments?

equanz avatar Jun 11 '24 05:06 equanz

Hi @codelipenghui . Do you have any comments?

Sorry, I'm struggling in other tasks. Please go ahead to merge the PR. @poorbarcode

codelipenghui avatar Jun 18 '24 01:06 codelipenghui

@equanz Could you please handle the conflicts?

poorbarcode avatar Jun 18 '24 01:06 poorbarcode

/pulsarbot rerun-failure-checks

equanz avatar Jun 18 '24 09:06 equanz

@poorbarcode Addressed. PTAL

equanz avatar Jun 18 '24 10:06 equanz

@poorbarcode Could you take a look?

equanz avatar Jul 09 '24 06:07 equanz