SOLR-17453: Leverage waitForState() instead of busy waiting

Open psalagnac opened this issue 1 year ago • 1 comments

https://issues.apache.org/jira/browse/SOLR-17453

Description

For some changes done to the cluster state via overseer, the code actively refreshes the seen cluster state until the change is fully done. This is a waste of resources, and this can be replaced by a ZK watch.

Solution

Replace busy waiting (mostly done with TimeOut class) by Zookeeper watches and ZkStateReader.waitForState().

Tests

No new tests added. Some tests updated to also remove busy waiting.

Checklist

Please review the following and check all that apply:

[x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
[x] I have created a Jira issue and added the issue ID to my pull request title.
[x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
[x] I have developed this patch against the main branch.
[x] I have run ./gradlew check.
[ ] I have added tests for my changes.
[ ] I have added documentation for the Reference Guide

Oct 02 '24 12:10 psalagnac

A line of javadocs on TimeOut recommending waitForState would be good but perhaps arguably a distraction to a general utility -- :shrug: whatever. I don't think we'll backslide much if all/most cluster state waiting is done using the correct API.

Oct 14 '24 18:10 dsmiley

One thought, is there a way to enforce the use of waitForState() pattern via any of our code quality tools?

Not sure how we can automate decision on whether usages of Timeout are legit or not. We should use waitForState() instead of busy waiting for changes in Zookeeper, so we leverage the registered watchers. There are other cases, mostly when doing Solr-to-Solr requests, where we should keep Timeout.

Nov 04 '24 15:11 psalagnac

Probably just a one-liner left and I'll merge away :-)

Nov 04 '24 23:11 dsmiley