[CURATOR-478] LeaderLatch accumulates additional watcher handlers
In the event of a connection reconnect, LeaderLatch calls reset():
Ultimately, this results in another call to getChildren(), which calls checkLeadership(), which registers another getData watch for the ephemeral leader record preceding our new leader record. However, the watch in place from before reset() is in place, and will trigger yet another watch in the event that the record it is watching gets deleted.
As such, the number of pending watchers (at least client side) will continue to increase each time the connection fails over.
Marked as trivial because I think it's unlikely these accumulate to the point that it's an issue, but it seems like it should at least be called out.
Originally reported by timcharper, imported from: LeaderLatch accumulates additional watcher handlers
- assignee: randgalt
- status: Open
- priority: Trivial
- resolution: Unresolved
- imported: 2025-01-21
I took a quick look and you're probably correct. A simple solution is to allocate the watcher as a field of the class. ZooKeeper guarantees that if you use the same watcher for the same path is only registered once.
Hmm, seems like we could also run into an issue with BackgroundCallback.
What would you think if we used an atomicInteger and incremented it each time, then cancelled if the atomic int value didn't match the value at the time we registered the callback?
I'm honestly not sure how I would add a unit test for this. It would be really hard to reliably create the conditions leading to this issue.