[fix][broker]Can't consume messages for a long time due to Entry Filter
Fixes:
- https://github.com/apache/pulsar/issues/16978
Motivation
When there are two consumers, users can specify the consumption behavior of each consumer by Entry filter:
case:consumer_1can consume 60% of the messages,consumer_2can consume 60% of the messages, and there is 10% intersection betweenconsumer_1andconsumer_2
If returns FilterResult.RESCHEDULE for more than 10% of messages, then it's possible: some message that can only be consumed by consumer_1 keeps redelivered to consumer_2, and some message that can only be consumed by consumer_2 keeps redelivered to consumer_1. Then the problem occurs:
- These messages can not be consumed anymore for a long time
- The number of redeliveries of these messages has been increasing ( redelivery by Entry Filter ), see code below(line: 141):
https://github.com/apache/pulsar/blob/8441f6724b1aa502df580518ae14f0c559f53547/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractBaseDispatcher.java#L140-L148
You can reproduce the problem by run FilterEntryTest.testEntryFilterRescheduleMessageDependingOnConsumerSharedSubscription 20 times
Modifications
When a message is redelivered by the same consumer more than 3 times, make that consumer pause to receive this message for 1 second. Since tracking the consumption of all the messages cost memory too much, we trace only the earliest message.
Documentation
-
[ ]
doc-required(Your PR needs to update docs and you will update later) -
[x]
doc-not-needed(Please explain why) -
[ ]
doc(Your PR contains doc changes) -
[ ]
doc-complete(Docs have been already added)
Matching PR in forked repository
PR in forked repository:
- https://github.com/poorbarcode/pulsar/pull/8
@poorbarcode Please provide a correct documentation label for your PR. Instructions see Pulsar Documentation Label Guide.
This PR should merge into the following branches:
- master
- brnach-2.11
/pulsarbot rerun-failure-checks
The pr had no activity for 30 days, mark with Stale label.