SOLR-11535: Fix race condition in singleton-per-collection StateWatcher creation
ZkStateReader is supposed to register a singleton StateWatcher per collection name, but there's a race condition that can lead to multiple statewatchers being registered, recreating themselves in perpetuity, and accumulating indefinitely. We've seen up to ~16x accumulation in real-world deployments, leading to state update floods upon large-scale cluster operations (e.g. cluster restart). It's difficult to be 100% certain, but we strongly suspect that in the most pathological cases this can actually lead to nodes seizing up on shutdown as they're forced to handle a flood of incoming messages from other nodes (up/down/etc.), with many thousands of redundant zk callback threads.
It is again difficult to say with 100% certainty, but several months' worth of circumstantial evidence around cluster restarts suggests that this change has drastically reduced overall latency (up to ~2x), and increased consistency/reliability of cluster restarts. We have seen no problems introduced with clusters running this patch.
Having lived with these changes for a bit, I plan to commit this within the next day or two, pending feedback. Proposed CHANGES.txt entry:
* SOLR-11535: Fix race condition in singleton-per-collection StateWatcher creation (Michael Gibney)
I'm inclined to put this under "Bugs". Even though it's a bug that often doesn't often manifest in a visible way, the essence of this issue is that ZkStateReader has been leaking redundant StateWatchers that, once present, re-register themselves in perpetuity as long as the collection is still watched ... which definitely sounds to me like a bug.
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!