pulsar [fix] [ml] fix key_shared mode delivery are order out order after a consumers reconnection

Motivation

There is an issue that leads to repeated consumption when replay reading and cursor.rewind execute concurrently.

Information

read-position: 3:21
mark-deleted-position: 3: -1`
acked messages: 3:1 ~ 3:20

time	replay reading	`cursor.rewind`
1	Start a replay reading after message redelivery(`3:0`)
2	Replay reading is in-progress
3		`cursor.rewind`, which may be triggered by many cases
4		`cursor.readPosition` back to the first entry(`3:0`)
5	read completely, send `3:0` to consumer
6		read completely, send `3:0 ~ 3:20` to consumer

Issue: the consumer received [3:0, 3:0, 3:1, 3:2....3:20]. you can reproduce the issue by the new test testMixedReplayReadingAndNormalReading("testRepeatedDelivery")`

Modifications

Fix the issue
Add a test for the reverted PR, see more detail here: https://github.com/apache/pulsar/pull/23855#issuecomment-2597522865

Documentation

[ ] doc
[ ] doc-required
[x] doc-not-needed
[ ] doc-complete

Matching PR in forked repository

PR in forked repository: x

Jan 02 '25 11:01 poorbarcode

@poorbarcode Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

Jan 02 '25 11:01 github-actions[bot]

Please update the PR title. It currently includes [ml] and this PR doesn't seem to have anything to do with the managed ledger module. One details about PR titles: don't add a space between the prefixes. instead of of [fix] [broker] use [fix][broker] prefix.

Jan 03 '25 10:01 lhotari

@poorbarcode PIP-282 (implementation PR #21953) was made to address known out-of-order issues in the Key_Shared implementation. PIP-282 was targeted for Pulsar 4.0 and later replaced with PIP-379 implementation. It feels that major changes to the Pulsar 3.0 Key_Shared implementation should be made carefully due to the high chance of changes causing other regressions. This PR doesn't provide a test that reproduces the issue. Just wondering why we should keep on making major changes to Pulsar 3.0 Key_Shared implementation due to the risk of regressions. It would be better that customers move on to Pulsar 4.0 and the PIP-379 implementation where many of the problems have been addressed.

Jan 07 '25 10:01 lhotari

@poorbarcode This diff of this PR is currently messed up, so hard to see what the changes are. Just wondering if the correct solution would be to prevent replay reads and normal reads to happen at the same time? I reverted such changes from the recent PR with this commit https://github.com/apache/pulsar/pull/23874/commits/06827df01022d656345386a8b9cc9ee6424257ae . There's currently a problem that when a replay read starts, it doesn't cancel a pending read which is waiting for new entries.

Jan 23 '25 09:01 lhotari

@lhotari

@poorbarcode This diff of this PR is currently messed up, so hard to see what the changes are.

Rebased master, it is clear now.

Just wondering if the correct solution would be to prevent replay reads and normal reads to happen at the same time? I reverted such changes from the recent PR with this commit https://github.com/apache/pulsar/commit/06827df01022d656345386a8b9cc9ee6424257ae .

A normal reading can not start when there is a pending reading
A replay reading can start when there is a pending normal reading because maybe there are no more entries to read(the normal reading is just pending there and waiting for new messages)

I will modify the Motivation to make every thing clear late

Jan 23 '25 09:01 poorbarcode

A replay reading can start when there is a pending normal reading because maybe there are no more entries to read(the normal reading is just pending there and waiting for new messages)

"the normal reading is just pending there and waiting for new messages". It's better to cancel it since it does cause later problems. There's code to handle it, but the unnecessary work could be completely avoided.

Jan 23 '25 09:01 lhotari

@lhotari

"the normal reading is just pending there and waiting for new messages". It's better to cancel it since it does cause later problems. There's code to handle it, but the unnecessary work could be completely avoided.

It may lead E2E latency increases
The pending normal read will be looped to set and cancel
It may be a conflict with the feature read ahead before all messages sent are acknowledged
(Highlight) your solution can not solve the issue that this PR solves, see more detail at Motivation

Jan 23 '25 14:01 poorbarcode

It may lead E2E latency increases

There would be an opposite impact when handled correctly. When there's a race between a pending read and a replay read, either one will be discarded. This is why extra work is performed and that leads to increased latencies eventually since unnecessary reads are performed. There's currently no broker side caching for replay reads, so discarded replay reads will cause extra traffic between broker and bookies. To avoid the extra latency, when the pending read gets cancelled and once the replay read is complete, a new pending read should be issued so that new entries get handled immediately when they are available. I'll be doing this type of change to address #23845, which is one of the issues where a read doesn't continue when a read gets skipped.

The pending normal read will be looped to set and cancel

It may be a conflict with the feature read ahead before all messages sent are acknowledged

Please explain more about these items since I didn't catch the meaning.

(Highlight) your solution can not solve the issue that this PR solves, see more detail at Motivation

When I made the comment, there was no description for this PR, so I'll check the updated description.

Jan 23 '25 14:01 lhotari

@lhotari

BTW, we can start a new email channel to talk about this.

It may lead E2E latency increases There would be an opposite impact when handled correctly.

I consider that the current implementation is the best solution

The pending normal read will be looped to set and cancel It may be a conflict with the feature read ahead before all messages sent are acknowledged Please explain more about these items since I didn't catch the meaning.

The case Read Ahead:

there is consumer stuck, which leads messages being pushed into redeliver-queue
there is consumer not stuck, which need to read new messages

I think we should not cancel pending normal read.

Jan 24 '25 02:01 poorbarcode