lucene-solr Add MinimalSolrTest for scale testing

This test class does a few very basic things, but it does them at several scales. Improvements to this test will impact nearly everything else in the test suite. Everything that happens here already has test coverage via other tests, but this class is still useful for several scenarios:

Isolating the startup and teardown costs for a cluster
Capturing the startup time in a unit test rather than @Before methods
Profiling basic setup operations like cluster start or collection creation
Easily adjusting the scale of the tests - we rarely touch 100+ nodes in our test suite otherwise

In an effort to minimize overhead and other operations, we disable most logging. When troubleshooting specific areas you will likely find it useful to tweak these settings.

The commented run times for each method are rough measures from a single desktop computer. They will vary between environments, but the intent is to have some baseline for spotting regressions.

Apr 06 '20 17:04 madrob

How exactly will we be capturing these times & plotting them?

As a regression test, we need actual timeout thresholds, and perhaps assuming only a particular CI machine runs them without other stuff going on. Or do you have another idea?

BTW I wonder if "ROOT=OFF" would work for the log thing. ROOT is a special name; at least it was in Log4j or something when I looked once.

Apr 07 '20 03:04 dsmiley

Yea, we can't have hard timeouts because if this gets run on CI servers that can be a huge range of hardware, and if it gets run on cloud hosts somewhere with noisy neighbors then it gets unreliable too.

I tried ROOT=OFF and that didn't work. Currently we still get logging from jetty-launcher and jetty-closer threads, and I spent a little bit of time trying to find where to control that but eventually decided that it doesn't have enough of an impact.

Actual capture and plotting can come later, and might live outside of the repo. I haven't completely thought that through yet. However, this test is still useful without that in place, because it gives us something easy to profile. In conjunction with Miller's old branch, it's how I found SOLR-14274 (#1299), SOLR-14289 (#1298), SOLR-14223 (#1217).

Apr 07 '20 16:04 madrob

We could have hard timeouts if they are run by a specific CI machine, perhaps @sarowe real hardware?

Before this gets committed, we need to ensure it is not run yet by default because it isn't asserting anything.

Apr 08 '20 13:04 dsmiley

Would this be better in test-framework as a stub? My goals here are to always have something that I can run against master without needing to recreate this class every time I update my branch or constantly rebasing a patch or whatever. I don't think this makes sense as a JMH bench. We could add trivial assert that the test run times have a total ordering (compare them all in @AfterClass)?

Apr 08 '20 22:04 madrob

Test-framework as a stub makes sense. Essentially we don't want this pretend-test slowing down whole test-runs. I get the sense there is room for improvement in how large projects like ours categorize tests (e.g. via annotations) and choose which to run and not to run.

Apr 09 '20 02:04 dsmiley

Test-framework as a stub makes sense

I tried making this work, but it got really messy because I started having to move the configset over modules and rearranging much more of the project than I wanted to.

Part of the problem is that we don't have something like solr/benchmark matching the Lucene counterpart.

Apr 30 '20 02:04 madrob