pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker]Can't consume messages for a long time due to Entry Filter

Open poorbarcode opened this issue 3 years ago • 4 comments

Fixes:

  • https://github.com/apache/pulsar/issues/16978

Motivation

When there are two consumers, users can specify the consumption behavior of each consumer by Entry filter:

  • case: consumer_1 can consume 60% of the messages, consumer_2 can consume 60% of the messages, and there is 10% intersection between consumer_1 and consumer_2

If returns FilterResult.RESCHEDULE for more than 10% of messages, then it's possible: some message that can only be consumed by consumer_1 keeps redelivered to consumer_2, and some message that can only be consumed by consumer_2 keeps redelivered to consumer_1. Then the problem occurs:

  • These messages can not be consumed anymore for a long time
  • The number of redeliveries of these messages has been increasing ( redelivery by Entry Filter ), see code below(line: 141):

https://github.com/apache/pulsar/blob/8441f6724b1aa502df580518ae14f0c559f53547/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractBaseDispatcher.java#L140-L148

You can reproduce the problem by run FilterEntryTest.testEntryFilterRescheduleMessageDependingOnConsumerSharedSubscription 20 times

Modifications

When a message is redelivered by the same consumer more than 3 times, make that consumer pause to receive this message for 1 second. Since tracking the consumption of all the messages cost memory too much, we trace only the earliest message.

Documentation

  • [ ] doc-required (Your PR needs to update docs and you will update later)

  • [x] doc-not-needed (Please explain why)

  • [ ] doc (Your PR contains doc changes)

  • [ ] doc-complete (Docs have been already added)

Matching PR in forked repository

PR in forked repository:

  • https://github.com/poorbarcode/pulsar/pull/8

poorbarcode avatar Sep 21 '22 22:09 poorbarcode

@poorbarcode Please provide a correct documentation label for your PR. Instructions see Pulsar Documentation Label Guide.

github-actions[bot] avatar Sep 21 '22 22:09 github-actions[bot]

This PR should merge into the following branches:

  • master
  • brnach-2.11

poorbarcode avatar Sep 21 '22 23:09 poorbarcode

/pulsarbot rerun-failure-checks

poorbarcode avatar Oct 07 '22 02:10 poorbarcode

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Nov 18 '22 02:11 github-actions[bot]