lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Improve smoketester JDK versions support

Open rmuir opened this issue 1 year ago • 21 comments

Description

Currently it is difficult for smoketester to "keep up" with supported java versions, which are moving faster these days.

@uschindler ran into it with the release vote for 9.10, and I ran into it on #12753

From Uwe:

JAVA_HOME must run be Java 11 (in 9.x) At moment you can pass "--test-java17

", but this one is also checked to be really java 17 (by parsing strings from its version output), but I'd like to pass "--test-alternative-java " multiple times and it would just run all those as part of smoking, maxbe the version number can be extracted to be printed out.

So I think the smoketester should require a specific version and check it, for JAVA_HOME. This should be the minimal required java version. But we should also allow passing other JDKs too.

This would make it easier to maintain the logic going forwards.

rmuir avatar Feb 15 '24 15:02 rmuir

Thanks @rmuir for opening the issue. I was about to do the same. It would be good to have a quickndirty solution working with the ongoing Lucene 9.10.0 release.

uschindler avatar Feb 15 '24 15:02 uschindler

I think it may still parse the Java version passed as alternative java homes, but just ensure they are >= the base version.

uschindler avatar Feb 15 '24 15:02 uschindler

gradle should fail with a clear error though if the version is not supported.

rmuir avatar Feb 15 '24 15:02 rmuir

gradle should fail with a clear error though if the version is not supported.

Correct.

My problem is in addition: I want to also test java 22, but Gradle refuses to run this one, you need RUNTIME_JAVA_HOME and execute with earlier one. Maybe I run test tests at command line then.

uschindler avatar Feb 15 '24 15:02 uschindler

yeah, i think for the smoketester we should just test only java versions supported by gradle? Since this is different than just running tests but trying to check that much more stuff works correctly.

rmuir avatar Feb 15 '24 15:02 rmuir

There is value in people running smoke tester in various environments and configurations but I think we could also define a github workflow which would run the smoke tester against a matrix of JVMs (and maybe OSs). This is quite trivial to do and the benefit here is that these multiple jobs in the matrix run in parallel.

The only thing I'm not sure of is whether a single check would fit within github's timeout limit (but I believe they should).

dweiss avatar Feb 15 '24 16:02 dweiss

It was a thing we previously did with ant and jenkins, there was a "nightly-smoketest" job at least to try to keep the smoketester working.

rmuir avatar Feb 15 '24 16:02 rmuir

From what I remember, at a basic level, the previous "nightly-smoketest" job would create a release, then run the smoketester on the release. Stuff such as GPG checks might have required special touch...

rmuir avatar Feb 15 '24 16:02 rmuir

You could even disable the running of tests (which might be the slowest part) for such a github workflow job. The real value is in the release checks and the packaging and all that, that isn't otherwise validated by any CI.

rmuir avatar Feb 15 '24 16:02 rmuir

I agree - I've just taken a look and tests consume a lot of time on gh jobs. The rest should be doable.

dweiss avatar Feb 15 '24 17:02 dweiss

There is value in people running smoke tester in various environments and configurations but I think we could also define a github workflow which would run the smoke tester against a matrix of JVMs (and maybe OSs). This is quite trivial to do and the benefit here is that these multiple jobs in the matrix run in parallel.

The only thing I'm not sure of is whether a single check would fit within github's timeout limit (but I believe they should).

I have a Jenkins Job for that. Running it in a matrix is possible, but does not help with the problem described here. The base JDK version (11 in 9.x) must always be provided and the alternative one can only Java 17. So I still would like to have the option to let Jenkins run with various JDKs in a flexible way. If you want to parallelize, you can do that - of course. But simplicity of setting up a Jenkins Job like I did is to run a single script with a suitable command line parameter.

So basically we need smoketester to run with base Java (enforced to ensure we are compatible). The xtra JVM passed to run tests should be configureable.

I'd still like to run all tests with the supplied Javas on the RELEASE.src file not from git checkout.

So let's only fix the static spaghetti in smoker to run base and a number of extras.

Uwe

uschindler avatar Feb 15 '24 17:02 uschindler

I agree - I've just taken a look and tests consume a lot of time on gh jobs. The rest should be doable.

It does not need to be Github (it would consume ASF's credits, which are limited). Jenkins does the matrix testing already.

uschindler avatar Feb 15 '24 17:02 uschindler

well, i think part of what makes changing the smoketester scary is that there's no validation of it in any PR I make. So if the github action would "verify smoketester works" but simply disable the execution of tests, it would make me feel a lot more comfortable. The tests will already be run by the other github action.

rmuir avatar Feb 15 '24 17:02 rmuir

Add a command line parameter to disable tests. The we could test it.

Anyways: If you have a branch I can try it on policeman.

uschindler avatar Feb 15 '24 17:02 uschindler

ok, i'm gonna look at trying to tackle the smoketester.py (for main branch first!) this evening.

rmuir avatar Feb 15 '24 17:02 rmuir

I was looking at this in the background and wrote a quick hack to add --test-alternative-java for branch_9x in #13108. I think it does what @uschindler wanted except print the JDK version numbers. I ran the script with JDK 20 and 21 (with some of the steps disabled) and it seems to work. There's plenty we can do to make the script nicer, but I didn't have more time today.

stefanvodita avatar Feb 15 '24 17:02 stefanvodita

Thanks @stefanvodita. I will give it a try now on Policeman Jenkins! Maybe remove the hard coded java 17. I think instead of printing Java version it could just print the command line / JAVA_HOME.

Can I pass multiple --test-alternative-java (I think this what append is for)?

I think a better solution may save the version number also in the returned tuple, but for now this helps.

uschindler avatar Feb 15 '24 17:02 uschindler

Can I pass multiple --test-alternative-java

You can. For example, I ran:

JAVA_HOME=/usr/lib/jvm/java-11-amazon-corretto python3 -u dev-tools/scripts/smokeTestRelease.py --test-alternative-java /usr/lib/jvm/java-20-amazon-corretto --test-alternative-java /usr/lib/jvm/java-21-amazon-corretto \
https://dist.apache.org/repos/dist/dev/lucene/lucene-9.10.0-RC1-rev-695c0ac84508438302cd346a812cfa2fdc5a10df

stefanvodita avatar Feb 15 '24 17:02 stefanvodita

Setting up the job already on Jenkins using your branch :-)

For Java 22 I will do a separate run of just tests, because it is impossible to run Gradle with Java 22.

uschindler avatar Feb 15 '24 17:02 uschindler

@stefanvodita 's PR branch works on Policeman Jenkins (we had to disable the version check): https://jenkins.thetaphi.de/job/Lucene-Release-Tester-v2/3/console

Once this succeeds I will give my +1 vote to Lucene 9.10.0 release :-) May take several hours.

uschindler avatar Feb 15 '24 18:02 uschindler

So, here's what I managed to do so far. I've set up a simple gh workflow that can be triggered manually and provided with a release candidate to verify - here's what it looks like:

image

the workflow runs a matrix test against an array of JDKs: 11, 17, 21, 22-ea, automatically fetching the latest version of these (temurin distribution). A parameter allows one to turn off the tests (I'm not sure how long they'd take on github - was afraid to check...).

Interestingly, I didn't have to touch the smoke tester at all as it already allows passing parameters to gradle, so I:

  • skipped all the tests by passing a filtering condition pointing at a non-existing test group,
  • compile and test against the matrix's java distribution using the existing runtime.java.home parameter we already have. I didn't modify the smoke tester so it reports that it runs gradle against Java 11, but reading the code I think it does the right thing and runs it against the provided Java version.

Here is the whole workflow definition - https://raw.githubusercontent.com/dweiss/lucene/main/.github/workflows/run-smoketester.yml

And here's a run I made to confirm it's working: https://github.com/dweiss/lucene/actions/runs/7932039311

note each run against a different JVM, including 22-ea, the logs confirming this: image

The execution times are not bad at all - ~10 minutes per job and they're mostly in parallel, so no big deal I think.

From here, I think it's a short walk to making a workflow that builds the distribution first, uploads it as an artifact which is then consumed by a job verifying the smoke tester against all the JVMs in question... unfortunately I won't be able to work on this the following week so if somebody wishes to take over and try, please go ahead.

dweiss avatar Feb 16 '24 14:02 dweiss