pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] The topic might reference a closed ledger

Open shibd opened this issue 1 year ago • 1 comments

Motivation

We observe that a normal topic might reference a closed ledger and it never auto recover. will cause the producer and customer stuck.

The root cause is related to topic create timeout. Assuming we have two concurrent requests to create a topic: firstCreate(Thread-1), secondCreate(Thread-2)

Thread-1 Thread-2
firstCreate topic timeout
remove this old topic and recreate a new topic
Open ledger from cache, and get an old ledger that create by firstCreate topic
call topic.close
old ledger close and remove it from cache
but this old ledger being referenced to new topic and that stats is close
  • When the firstCreate topic timeout, will call topic.close. it will close the ledger, and remove it from the ledger cache. https://github.com/apache/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1745-L1752

  • If the secondCreate request successfully creates a topic before the old ledger closes, the reference will be made to the old ledger.

Modifications

  • When creating a topic, if a topic future exists and it completes with a timeout exception, do not remove it; instead, return the exception directly.
  • Once the topic closure is complete, remove the topic from the topics.

Verifying this change

  • Add testCloseLedgerThatTopicAfterCreateTimeout to cover this case.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/shibd/pulsar/pull/37

shibd avatar May 17 '24 11:05 shibd

Great work!

dao-jun avatar May 18 '24 05:05 dao-jun

Close with this comments: https://github.com/apache/pulsar/pull/22739#discussion_r1606705897

shibd avatar May 20 '24 13:05 shibd