helix
helix copied to clipboard
race condition in ZkClient may leave untracked watcher on ZK
Describe the bug
public ChildrenSubscribeResult subscribeChildChanges(String path, IZkChildListener listener, boolean skipWatchingNonExistNode) {
synchronized (_childListener) {
Set<IZkChildListener> listeners = _childListener.get(path);
if (listeners == null) {
listeners = new CopyOnWriteArraySet<>();
_childListener.put(path, listeners);
}
listeners.add(listener);
}
List<String> children = watchForChilds(path, skipWatchingNonExistNode);
if (children == null && skipWatchingNonExistNode) {
unsubscribeChildChanges(path, listener);
LOG.info("zkclient{}, watchForChilds failed to install no-existing watch and add listener. Path: {}", _uid, path);
return new ChildrenSubscribeResult(children, false);
}
return new ChildrenSubscribeResult(children, true);
}
public void unsubscribeChildChanges(String path, IZkChildListener childListener) {
synchronized (_childListener) {
final Set<IZkChildListener> listeners = _childListener.get(path);
if (listeners != null) {
listeners.remove(childListener);
}
}
}
Current on Helix master (May 24 2023, TOT on commit 07b1bb8), subscribe child/data change does
- Lock in memory map,
- add new listener to map,
- unlock map
- Subscribe an one time listener to ZK
- If subscribe failed, call unsubscribe child/data change (lock, remove, unlock)
unsubscribe child/data change does 1.Lock in memory map 2. remove listener from map 3. unlock map
When multiple user calling subscribe and unsubscribe at the same time, unsubscribe child/data change may happen between subscribe step 3 and 4. As a result, the listener is not added to the map but the ZkClient is registered as watcher in ZK. Leaving a untracked watcher.
Expected behavior
When multiple user calling subscribe and unsubscribe at the same time, the final result is undetermined. However, all registered ZK watcher should be tracked.
Additional context
Add any other context about the problem here.