infrastructure
infrastructure copied to clipboard
Ansible request for ensuring jenkins agents are run with JDK11+
Delete as appropriate from this list:
- Missing install
Details: At the end of June Jenkins announced in a blog post that the Jenkins 2.357 and the forthcoming 2.361.1 LTS would require Java 11 or 17 on the server side (We are now running ours with Temurin 17).
With those new versions, using Java 8 for the jenkins agent systems will NOT BE SUPPORTED.
For this reason, we need to upgrade the jenkins agents to all run java 11 (or later) before we perform the next Jenkins LTS upgrade. We are currently on 2.346.3 (previous LTS) and will need to look at upgrading to 2.36.1. But before that, we need to ensure all the jenkins agents are running a suitable version.
Solaris in particular only has Java 8 on it (Temurin does not produce a later one) but there are versions of JDK11 for Solaris available from:
- https://bell-sw.com/pages/downloads/
- https://www.azul.com/downloads/?version=java-11-lts&package=jdk (Explicitly says Solaris 10 is supported)
Or we could try building our own JDK11, but I don't think our machine configurations supported that the last time I tried.
FYA @karianna and I'll tag @steelhead31 too since he likes playing with Solaris!
So far none of the above options are proving to be feasible for Solaris 10 due to the dependency on the Solaris 11 posix_fallocate
symbol as referenced in this article.
We should aim for JDK17 on all systems (For AIX we can use a nightly build or GA candidate)
So far none of the above options are proving to be feasible for Solaris 10 due to the dependency on the Solaris 11
posix_fallocate
symbol as referenced in this article.
Fixed. a JDK11 with a 'fake' posix_fallocate
(Not a very common function) will allow a prebuilt java 11 to run if you set LD_PRELOAD
in the environment to a trivial library created as follows (It should print a message if the function is called, but the jenkins agent does not appear to trigger it:
cat > fallocate.c << EOT
#include <stdio.h>
int posix_fallocate(int fd, off_t offset, off_t len)
{
fprintf(stderr, "posix_fallocate() called but stubbed out\n");
}
EOT
cc -G -m64 -o fallocate.so -Kpic fallocate.c
$ LD_PRELOAD=$PWD/fallocate.so /usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java -version
openjdk version "11.0.17" 2022-10-18 LTS
OpenJDK Runtime Environment Zulu11.60+19-CA (build 11.0.17+8-LTS)
OpenJDK 64-Bit Server VM Zulu11.60+19-CA (build 11.0.17+8-LTS, mixed mode)
bash-3.2$
A couple of extra adjustments were needed to make git
work properly as it was, by default, getting the LD_PRELOAD
setting from the java process. I created a small wrapper around git
to squash LD_PRELOAD
(since git was a 32-bit application and was trying to use the symbol) I created this:
mkdir -p /usr/local/sxabin
cat > /usr/local/sxabin/git
#!/bin/sh
/usr/local/bin/git "$@"
And set the jenkins configuration for the agent to point to Tool locations
-> git
to /usr/local/sxabin/git
and als set /usr/local/sxabin/git
first in the PATH
. This appears to have worked: https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin/201/console
This looks like a good solution to me, the only real impact is on the jenkins agent, its probably a cleaner (and less prone to errors) than running via ssh forwarding or similar.
Unfortunately I hadn't noticed that the linked job ran on the old Solaris machine (I thought it had been disabled) so this isn't fully working yet as the LD_PRELOAD
is going to sub-processes and since most of them are 32-bit the library causes a failure:
ld.so.1: sh: fatal: /usr/lib/jvm/fallocate-preload.so: wrong ELF class: ELFCLASS64
Killed
Considering not persuing this and going down another route - build JDK11 on a Solaris 11 system but adjust the code so it doesn't require posix_fallocate
@ptribble's 11.0.2 works on Solaris 10/x64: https://pkgs.tribblix.org/openjdk/openjdk11.0.2-s10-x86_64.tar.bz2
sha256sum a5484bd35ed15ea7dc97870cea470aedf0c713ecca8075e57954a70e8b32cd89
Using LD_PRELOAD_64 rather than a bare LD_PRELOAD ought to restrict it to 64-bit processes, if you want to pursue that route.
Using LD_PRELOAD_64 rather than a bare LD_PRELOAD ought to restrict it to 64-bit processes, if you want to pursue that route.
Thank you! I don't think I've used that before but it's EXACTLY what I needed and https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin/208/ has successfully built with the job running via a JDK11u jenkins agent on the machine.
So, for reference:
- I built the shared library from the earlier comment on the machines and put it in /usr/lib/jvm
- Extracted the X64 or SPARC version of Zulu under /usr/lib/jvm
- Adjust the jenkins agent startup to have a
JavaPath
of/usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java
andPrefix Start Agent Command
ofexport LD_PRELOAD_64=/usr/lib/jvm/fallocate.so &&
. Or if using a startup script useLD_PRELOAD_64=/usr/lib/jvm/fallocate-preload.so /usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java [...]
There were a few messages showing in the startup log on SPARC, but that appears to be building ok too.
FYI @speakjava - we might have a solution :-)
We still have over 50 systems running JDK8 as the agent. Thanks to @steelhead31 for collating this list. We can check these off as they are fixed. Also need to ensure that the playbooks deploy new ones with JDK11/17 available (and ideally as the default)
Windows (14)
- [ ] build-alibaba-win2012r2-x64-1
- [x] build-alibaba-win2012r2-x64-2
- [x] build-azure-win2012r2-x64-1
- [x] build-azure-win2012r2-x64-2
- [ ] build-azure-win2012r2-x64-4-sxa
- [x] build-azure-win2016-x64-1
- [x] build-ibmcloud-win2012r2-x64-1
- [x] build-ibmcloud-win2012r2-x64-2
- [x] test-azure-win2012r2-x64-1
- [x] test-azure-win2012r2-x64-3
- [x] test-azure-win2016-x64-1
- [x] test-azure-win2019-x64-1
- [x] test-ibmcloud-win2012r2-x64-1
- [x] test-ibmcloud-win2012r2-x64-2
Linux build + test (22)
- [x] build-digitalocean-centos69-x64-2
- [x] build-osuosl-centos74-ppc64le-1X
- [x] build-osuosl-centos74-ppc64le-2
- [x] docker-osuosl-ubuntu2004-ppc64le-1
- [x] test-aws-rhel76-armv8-1
- [x] ~test-aws-rhel8-x64-1~
- [x] ~test-docker-alpine316-aarch64-1 - Removed From Jenkins
- [x] ~test-docker-ubuntu1804-armv8l-2 - Removed From Jenkins
- [x] test-docker-ubuntu1804-armv8l-4
- [x] test-docker-ubuntu2110-armv8l-1
- [x] test-docker-ubuntu2204-armv8l-2
- [x] test-ibmcloud-rhel7-x64-1
- [x] test-ibmcloud-ubuntu1604-x64-1
- [x] test-osuosl-centos74-ppc64le-1
- [x] test-osuosl-centos74-ppc64le-2
- [x] test-osuosl-ubuntu1604-ppc64le-1
- [x] test-osuosl-ubuntu1604-ppc64le-2
- [x] test-osuosl-ubuntu1804-ppc64le-1
- [x] test-osuosl-ubuntu1804-ppc64le-2
- [x] test-osuosl-ubuntu2004-ppc64le-1
- [x] test-scaleway-ubuntu1604-x64-1
- [x] test-skytap-ubuntu2004-ppc64le-1
Others (16)
- [x] build-osuosl-aix71-ppc64-1
- [x] build-osuosl-aix71-ppc64-2
- [ ] ~build-packet_esxi-solaris10u11-x64-1~
- [x] build-siteox-solaris10u11-sparcv9-1
- [x] infra-ibmcloud-vagrant-x64-1.1
- [x] infra-ibmcloud-vagrant-x64-1.2
- [x] infra-ibmcloud-vagrant-x64-1.3
- [x] infra-ibmcloud-vagrant-x64-1.4
- [x] infra-ibmcloud-vagrant-x64-1.5
- [x] test-osuosl-aix715-ppc64-1 p9-aix1-adopt05.osuosl.org
- [x] test-osuosl-aix715-ppc64-2 adopt06
- [x] test-osuosl-aix715-ppc64-3 adopt07
- [x] test-osuosl-aix715-ppc64-4 adopt08
- [ ] test-osuosl-aix72-ppc64-1 adopt03
- [x] test-osuosl-aix72-ppc64-2 adopt04
- [x] test-siteox-solaris10u11-sparcv9-1
Note that from the comments in https://github.com/adoptium/infrastructure/issues/2847 the Solaris 10 machines will require Bellsoft Liberica 11 to be used (Azul's seems to result inhigh CPU load after a while on Solaris 10) and requires the fallocate
preload mentioned in an earlier comment.
@steelhead31 @Haroon-Khel I've just adjusted the last comment to categorise the machines (Windows/Linux build+test/Other). I reckon at this point we should set an hour aside and do this. Would you be interested in doing it with an open call to discuss any issues? FYI @karianna in case you have someone who has access and would like to take on the windows subset :-)
@steelhead31 Can you easily tell how many of the ones already migrated are using a jenkins config that points to a specific java as opposed to the default version? (I'm not sure if the command lines you were getting would say just java
in the latter case)
@sxa / @Haroon-Khel sounds like a good idea to me, I'll produce an up to date ( and complete list ) of the current state, this afternoon in preperation for doing this.... Im not sure how easy it is to tell default java from specified, as it just shows a path.. lets see if the updated list offers any insight
I'm somewhat in two minds about whether to change the default or override in jenkins (and if we override we need to bear in mind me raising #2912 recently!) But I think an override for the remaining ones, then possibly look at adjusting the defaults and resetting the override after 2912 goes in is probably my preferred approach ...
+1 from me on overriding. From what ive seen, the default is usually 8, so unless you guys have an objection I see no reason as to why we shouldnt override it with 17 (on platforms that have 17)
@steelhead31 I guess the table from https://github.com/adoptium/infrastructure/issues/2879#issuecomment-1408677930 would suggest that the ones with an "empty" Java version
column are using the default on the system.
New audit sheet is available here..
https://docs.google.com/spreadsheets/d/1MS4IcUSyJxOiKeQ14HN-uajstIv8hGM7Xz9hXFTmfnk/edit?usp=sharing
@steelhead31 I guess the table from #2879 (comment) would suggest that the ones with an "empty"
Java version
column are using the default on the system.
I believe so for the online ones at least... they will be using the system default, or they may have been connected from the node itself with a different JDK from the default... I'll audit a few of them, and see if its obvious to discern those two cases.
Looks like there's plenty of precedent and no objections to tweaking the jenkins agent config so I suggest we go for that, using /usr/lib/jvm/jdk-17
where available and a suitable platform-specific alternative elsewhere.
@sxa I'm guessing build-azure-win2012r2-x64-4-sxa is on your home network? I cant get into it
Since I deleted the jenkins service on build-alibaba-win2012r2-x64-1 and I cant get it back up, for now I have the jenkins agent running in a background process with java17.
Steps to change the java path for the jenkins service:
- In the jenkins user folder (C:\Users\jenkins usually) there should be a jenkins-slave.xml file
- Open it with an editor and change the executable path to your desired java path
- Save and close the file
- Restart the jenkins service in Services
- Check the System Information page for the jenkins node to see if the update has taken place
@sxa I'm guessing build-azure-win2012r2-x64-4-sxa is on your home network? I cant get into it
No - the only ones hosted be me are the test-sxa
ones. I'll check that definition as I think it's one of a set that I used when working with Andrew on reproducible builds. It probably just needs to be deleted.
Actions arising from today's activities:
- Decommission (probably) build-azure-win2012r2-x64-4-sxa as per the comment above
- Diagnose connection problem on build-osuosl-aix71-ppc64-2 https://github.com/adoptium/infrastructure/issues/2922 (It's twin build-osuosl-aix71-ppc64-1 is unaffected)
- Fix Bastillion for DigitalOcean CentOS system
- Check vagrant.yml and dockerhost.yml to ensure they will install a suitable JDK17
- Fix build-alibaba-win2012r2-x64-1 which was unable to pull the remoting.jar down (It's twin build-alibaba-win2012r2-x64-2 appears unaffected)
release
You are trying to get resource http://47.111.84.87:8080/jnlpJars/remoting.jar but it is not in cache and could not be downloaded. Attempting to continue, but you may expect failure
JAR http://47.111.84.87:8080/jnlpJars/remoting.jar not found. Continuing.
JAR http://47.111.84.87:8080/jnlpJars/remoting.jar not found. Continuing.
netx: Initialization Error: Could not initialize application. (Fatal: Initialization Error: Unknown Main-Class. Could not determine the main class for this application.)
Follow-on actions: Change all the definitions once we implement #2912 ;-)
Based on the new column from the plugin in #2950 there are five machines still running JDK8 for the agent:
The final machines have all been upgraded to JDK17 for jenkins agents.