pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] Fix one potential to add duplicated consumer

Open TakaHiR07 opened this issue 1 year ago • 6 comments

Fixes #20576

Motivation

There is a case resulted in duplicated consumer, if we add the same consumer twice continuously, then close consumer, the offline consumer would remain in consumerList, because AbstractDispatcherMultipleConsumers#consumerList is arrayList and AbstractDispatcherMultipleConsumers#consumerSet is set.

There is a concurrent problem in ServerCnx#handleSubscribe, which cause this case. Suppose client request handleSubscribe() three times and request handleCloseConsumer(). The execute order is as follow. since the order is addConsumer->addConsumer->close, but not addConsumer->close->addConsumer, duplicated consumer problem occur.

企业微信截图_1ea4ed11-3c2b-4e46-94a8-aad1be312b93

Modifications

  1. remove the step 9 "consumersLongHashmap.remove consumerFuture3".
  • Because if judge existingConsumerFuture is not null, it means there is an earlier request try to handleSubscribe. the remove operation should be executed in the earlier request.
  1. add a test to verify this concurrent problem and the fix

Verifying this change

  • [x] Make sure that the change passes the CI checks.

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/TakaHiR07/pulsar/pull/10

TakaHiR07 avatar Jun 15 '23 10:06 TakaHiR07

You say in the linked issue and you have seen this twice. Are you able to reproduce it manually and verify that with this fix the problem is really fixed?

eolivelli avatar Jun 15 '23 10:06 eolivelli

You say in the linked issue and you have seen this twice. Are you able to reproduce it manually and verify that with this fix the problem is really fixed?

I guess this is one of the possibilities for duplicated consumer. And this pr is corresponding to our production case. I can't affirm the duplicated consumer problem do not occur anymore.

TakaHiR07 avatar Jun 15 '23 10:06 TakaHiR07

@poorbarcode @eolivelli @lhotari @congbobo184 Can you help review this pr? It is another possibilities besides https://github.com/apache/pulsar/pull/15051

TakaHiR07 avatar Jun 16 '23 06:06 TakaHiR07

Add a test to verify this concurrent case and the fix. PTAL, thanks. @codelipenghui @eolivelli @lhotari @poorbarcode

TakaHiR07 avatar Jun 25 '23 12:06 TakaHiR07

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Aug 31 '23 01:08 github-actions[bot]

I had the same problem on flink task and got fixed after trying this PR.

graysonzeng avatar Jan 31 '24 09:01 graysonzeng

@poorbarcode @codelipenghui It seems this problem is not fixed in the master branch.

TakaHiR07 avatar Mar 18 '24 08:03 TakaHiR07

#22283 has fixed the same problem. Close this pr.

TakaHiR07 avatar May 20 '24 06:05 TakaHiR07