[improve][broker]Ensure namespace deletion doesn't fail
Motivation
Deleting a namespace sometimes fails and leaves znodes around, especially the "bundle owner" nodes /namespace/NAMESPACE/xxxxxx
Modifications
- ensure that all the children of /namespace/NAMESPACE are deleted, especially the ephemeral nodes for ownership
- ensure that on the code path that handles the deletion of the namespace we make progress, the main fact is about not relying on "cache.exists" before using a "delete": the new code is more pessimistic that goes with "deleteIfExists", handling "NotFound" gracefully
- add more logs about the delete namespace execution path
With this patch
- delete namespace seems to consistently work
- there are no more spammy logs about errors around the topic policies event reader (changelog_events...)
- in case we see an error we log at which point the procedure was (because we log all the paths that are deleted from zk)
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/eolivelli/pulsar/pull/27
This script reproduces MANY problems with namespace deletions, just run it in a loop on a Pulsar cluster with a few brokers (like 6) and more than one ZK server (I am using 3 nodes)
reproduce.sh
#! /bin/bash
export PATH=$PATH:/pulsar/bin
set -x
pulsar-admin topics delete-partitioned-topic non-persistent://T1/N1/P1 -f
bin/pulsar zookeeper-shell -server pulsar-zookeeper-ca:2181 ls -R /T1/N1 | grep namespace
pulsar-admin namespaces delete T1/N1
pulsar-admin tenants delete T1
pulsar-admin tenants create T1
pulsar-admin namespaces create T1/N1 -b 64
pulsar-admin topics create-partitioned-topic non-persistent://T1/N1/P1 -p 20
exit 0
Execute it in a loop
while true ; do ./reproduce.sh ; done
@lhotari I think that it is worth to port this fix to all the active branches, maybe up to 3.0. Wdyt?
@lhotari I think that it is worth to port this fix to all the active branches, maybe up to 3.0. Wdyt?
@eolivelli makes sense. Btw. there's a rabbit hole in namespace deletion. Currently concurrency limits are missing and a namespace deletion will attempt to concurrency delete all possible topics in the namespace. This is something that would have to be addressed to improve the reliability of namespace deletion. More details in #22541 comments such as https://github.com/apache/pulsar/pull/22541#issuecomment-2071621213 and https://github.com/apache/pulsar/pull/22541#issuecomment-2071687995 .
Codecov Report
Attention: Patch coverage is 43.13725% with 29 lines in your changes are missing coverage. Please review.
Project coverage is 32.47%. Comparing base (
bbc6224) to head (23c39c1). Report is 261 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #22627 +/- ##
=============================================
- Coverage 73.57% 32.47% -41.10%
+ Complexity 32624 87 -32537
=============================================
Files 1877 1745 -132
Lines 139502 133886 -5616
Branches 15299 14673 -626
=============================================
- Hits 102638 43482 -59156
- Misses 28908 84033 +55125
+ Partials 7956 6371 -1585
| Flag | Coverage Δ | |
|---|---|---|
| inttests | 27.52% <43.13%> (+2.94%) |
:arrow_up: |
| systests | 24.80% <5.88%> (+0.47%) |
:arrow_up: |
| unittests | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files | Coverage Δ | |
|---|---|---|
| .../service/SystemTopicBasedTopicPoliciesService.java | 52.26% <100.00%> (-21.93%) |
:arrow_down: |
| ...he/pulsar/metadata/impl/AbstractMetadataStore.java | 55.31% <100.00%> (-29.25%) |
:arrow_down: |
| ...ulsar/broker/resources/LocalPoliciesResources.java | 44.82% <0.00%> (-31.04%) |
:arrow_down: |
| ...he/pulsar/broker/resources/NamespaceResources.java | 53.90% <25.00%> (-26.25%) |
:arrow_down: |
| ...apache/pulsar/broker/resources/TopicResources.java | 39.58% <33.33%> (-57.03%) |
:arrow_down: |
| .../org/apache/pulsar/metadata/api/MetadataStore.java | 60.00% <50.00%> (-20.00%) |
:arrow_down: |
| .../apache/pulsar/broker/resources/BaseResources.java | 55.85% <50.00%> (-16.88%) |
:arrow_down: |
| ...pache/pulsar/broker/admin/impl/NamespacesBase.java | 15.22% <0.00%> (-57.86%) |
:arrow_down: |