SOLR-17453: Leverage waitForState() instead of busy waiting
https://issues.apache.org/jira/browse/SOLR-17453
Description
For some changes done to the cluster state via overseer, the code actively refreshes the seen cluster state until the change is fully done. This is a waste of resources, and this can be replaced by a ZK watch.
Solution
Replace busy waiting (mostly done with TimeOut class) by Zookeeper watches and ZkStateReader.waitForState().
Tests
No new tests added. Some tests updated to also remove busy waiting.
Checklist
Please review the following and check all that apply:
- [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
- [x] I have developed this patch against the
mainbranch. - [x] I have run
./gradlew check. - [ ] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide
A line of javadocs on TimeOut recommending waitForState would be good but perhaps arguably a distraction to a general utility -- :shrug: whatever. I don't think we'll backslide much if all/most cluster state waiting is done using the correct API.
One thought, is there a way to enforce the use of waitForState() pattern via any of our code quality tools?
Not sure how we can automate decision on whether usages of Timeout are legit or not. We should use waitForState() instead of busy waiting for changes in Zookeeper, so we leverage the registered watchers. There are other cases, mostly when doing Solr-to-Solr requests, where we should keep Timeout.
Probably just a one-liner left and I'll merge away :-)