[fix] [ml] fix key_shared mode delivery are order out order after a consumers reconnection
Motivation
- There is an issue that leads to repeated consumption when replay reading and
cursor.rewindexecute concurrently.
Information
read-position:3:21mark-deleted-position:3: -1`acked messages:3:1 ~ 3:20
| time | replay reading | cursor.rewind |
|---|---|---|
| 1 | Start a replay reading after message redelivery(3:0) |
|
| 2 | Replay reading is in-progress | |
| 3 | cursor.rewind, which may be triggered by many cases |
|
| 4 | cursor.readPosition back to the first entry(3:0) |
|
| 5 | read completely, send 3:0 to consumer |
|
| 6 | read completely, send 3:0 ~ 3:20 to consumer |
Issue: the consumer received [3:0, 3:0, 3:1, 3:2....3:20]. you can reproduce the issue by the new test testMixedReplayReadingAndNormalReading("testRepeatedDelivery")`
Modifications
- Fix the issue
- Add a test for the reverted PR, see more detail here: https://github.com/apache/pulsar/pull/23855#issuecomment-2597522865
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: x
@poorbarcode Please add the following content to your PR description and select a checkbox:
- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
Please update the PR title. It currently includes [ml] and this PR doesn't seem to have anything to do with the managed ledger module.
One details about PR titles: don't add a space between the prefixes. instead of of [fix] [broker] use [fix][broker] prefix.
@poorbarcode PIP-282 (implementation PR #21953) was made to address known out-of-order issues in the Key_Shared implementation. PIP-282 was targeted for Pulsar 4.0 and later replaced with PIP-379 implementation. It feels that major changes to the Pulsar 3.0 Key_Shared implementation should be made carefully due to the high chance of changes causing other regressions. This PR doesn't provide a test that reproduces the issue. Just wondering why we should keep on making major changes to Pulsar 3.0 Key_Shared implementation due to the risk of regressions. It would be better that customers move on to Pulsar 4.0 and the PIP-379 implementation where many of the problems have been addressed.
@poorbarcode This diff of this PR is currently messed up, so hard to see what the changes are. Just wondering if the correct solution would be to prevent replay reads and normal reads to happen at the same time? I reverted such changes from the recent PR with this commit https://github.com/apache/pulsar/pull/23874/commits/06827df01022d656345386a8b9cc9ee6424257ae . There's currently a problem that when a replay read starts, it doesn't cancel a pending read which is waiting for new entries.
@lhotari
@poorbarcode This diff of this PR is currently messed up, so hard to see what the changes are.
Rebased master, it is clear now.
Just wondering if the correct solution would be to prevent replay reads and normal reads to happen at the same time? I reverted such changes from the recent PR with this commit https://github.com/apache/pulsar/commit/06827df01022d656345386a8b9cc9ee6424257ae .
- A normal reading can not start when there is a pending reading
- A replay reading can start when there is a pending normal reading because maybe there are no more entries to read(the normal reading is just pending there and waiting for new messages)
I will modify the Motivation to make every thing clear late
- A replay reading can start when there is a pending normal reading because maybe there are no more entries to read(the normal reading is just pending there and waiting for new messages)
"the normal reading is just pending there and waiting for new messages". It's better to cancel it since it does cause later problems. There's code to handle it, but the unnecessary work could be completely avoided.
@lhotari
"the normal reading is just pending there and waiting for new messages". It's better to cancel it since it does cause later problems. There's code to handle it, but the unnecessary work could be completely avoided.
- It may lead E2E latency increases
- The pending normal read will be looped to
setandcancel - It may be a conflict with the feature
read ahead before all messages sent are acknowledged - (Highlight) your solution can not solve the issue that this PR solves, see more detail at Motivation
- It may lead E2E latency increases
There would be an opposite impact when handled correctly. When there's a race between a pending read and a replay read, either one will be discarded. This is why extra work is performed and that leads to increased latencies eventually since unnecessary reads are performed. There's currently no broker side caching for replay reads, so discarded replay reads will cause extra traffic between broker and bookies. To avoid the extra latency, when the pending read gets cancelled and once the replay read is complete, a new pending read should be issued so that new entries get handled immediately when they are available. I'll be doing this type of change to address #23845, which is one of the issues where a read doesn't continue when a read gets skipped.
- The pending normal read will be looped to
setandcancel- It may be a conflict with the feature
read ahead before all messages sent are acknowledged
Please explain more about these items since I didn't catch the meaning.
- (Highlight) your solution can not solve the issue that this PR solves, see more detail at Motivation
When I made the comment, there was no description for this PR, so I'll check the updated description.
@lhotari
BTW, we can start a new email channel to talk about this.
It may lead E2E latency increases There would be an opposite impact when handled correctly.
I consider that the current implementation is the best solution
The pending normal read will be looped to set and cancel It may be a conflict with the feature read ahead before all messages sent are acknowledged Please explain more about these items since I didn't catch the meaning.
The case Read Ahead:
- there is consumer stuck, which leads messages being pushed into redeliver-queue
- there is consumer not stuck, which need to read new messages
I think we should not cancel pending normal read.