[Bug] Race condition between ZkBookieRackAffinityMapping and RackawareEnsemblePlacementPolicy ZK watch listeners causing incorrect rack resolution
Search before reporting
- [x] I searched in the issues and found nothing similar.
Read release policy
- [x] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
User environment
- master
- java 17
Issue Description
WatchTask of Zookeeper writableBookies has two listeners related to rackAwareness attached to it.
- ZkBookieRackAffinityMapping which extends BookieRackAffinityMapping and responsible for maintaining the rackInfoMap and racksWithHost which is used by placement policies to get the updated rackInfo.
- RackawareEnsemblePlacementPolicy which uses the ZkBookieRackAffinityMapping to resolve the network location.
As they are both listeners without any ordering, there is a real chance where the resolve network location(rack path) is called by RackAwarePlacementPolicy before the updateRacksWithHost is called on the BookieRackAffinityMapping which refreshes the rack information.
This would lead to the placement policy unable to update find the network location (rack path) of the bookie and fall back into default-rack. This will get updated when the next watchEvent is triggered.
At the moment, rackawarePolicy.onBookieRackChange(new ArrayList<>(bookieIdSet)) is called only when the bookie rack info changes on the zookeeper and not when updateRacksWithHost is called. Updating the code to call onBookieRackChange when the updateRacksWithHost completed should fix this issue.
Error messages
org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to resolve network location for XXX.XXX.XXX.XXX, using default rack for it : /default-rack.
Reproducing the issue
Configure RackAwarenessPlacementPolicy with the following config and bring down a bookie.
ensemblePlacementPolicy=org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy reppDnsResolverClass=org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping enforceMinNumRacksPerWriteQuorum=true minNumRacksPerWriteQuorum=3 bookkeeperMetadataServiceUri=zk+heirarchical://url/ledgers
Additional information
No response
Are you willing to submit a PR?
- [x] I'm willing to submit a PR!