[fix][broker] The topic might reference a closed ledger
Motivation
We observe that a normal topic might reference a closed ledger and it never auto recover. will cause the producer and customer stuck.
The root cause is related to topic create timeout. Assuming we have two concurrent requests to create a topic: firstCreate(Thread-1), secondCreate(Thread-2)
| Thread-1 | Thread-2 |
|---|---|
| firstCreate topic timeout | |
| remove this old topic and recreate a new topic | |
| Open ledger from cache, and get an old ledger that create by firstCreate topic | |
| call topic.close | |
| old ledger close and remove it from cache | |
but this old ledger being referenced to new topic and that stats is close |
-
When the
firstCreatetopic timeout, will calltopic.close. it will close the ledger, and remove it from the ledger cache. https://github.com/apache/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1745-L1752 -
If the
secondCreaterequest successfully creates a topic before theold ledger closes, the reference will be made to theold ledger.
Modifications
- When creating a topic, if a topic future exists and it completes with a timeout exception, do not remove it; instead, return the exception directly.
- Once the topic closure is complete, remove the topic from the topics.
Verifying this change
- Add testCloseLedgerThatTopicAfterCreateTimeout to cover this case.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/shibd/pulsar/pull/37
Great work!
Close with this comments: https://github.com/apache/pulsar/pull/22739#discussion_r1606705897