infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Ansible request for ensuring jenkins agents are run with JDK11+

Open sxa opened this issue 2 years ago • 2 comments

Delete as appropriate from this list:

  • Missing install

Details: At the end of June Jenkins announced in a blog post that the Jenkins 2.357 and the forthcoming 2.361.1 LTS would require Java 11 or 17 on the server side (We are now running ours with Temurin 17).

With those new versions, using Java 8 for the jenkins agent systems will NOT BE SUPPORTED.

For this reason, we need to upgrade the jenkins agents to all run java 11 (or later) before we perform the next Jenkins LTS upgrade. We are currently on 2.346.3 (previous LTS) and will need to look at upgrading to 2.36.1. But before that, we need to ensure all the jenkins agents are running a suitable version.

Solaris in particular only has Java 8 on it (Temurin does not produce a later one) but there are versions of JDK11 for Solaris available from:

  • https://bell-sw.com/pages/downloads/
  • https://www.azul.com/downloads/?version=java-11-lts&package=jdk (Explicitly says Solaris 10 is supported)

Or we could try building our own JDK11, but I don't think our machine configurations supported that the last time I tried.

FYA @karianna and I'll tag @steelhead31 too since he likes playing with Solaris!

sxa avatar Oct 04 '22 11:10 sxa

So far none of the above options are proving to be feasible for Solaris 10 due to the dependency on the Solaris 11 posix_fallocate symbol as referenced in this article.

sxa avatar Oct 04 '22 17:10 sxa

We should aim for JDK17 on all systems (For AIX we can use a nightly build or GA candidate)

sxa avatar Oct 13 '22 13:10 sxa

So far none of the above options are proving to be feasible for Solaris 10 due to the dependency on the Solaris 11 posix_fallocate symbol as referenced in this article.

Fixed. a JDK11 with a 'fake' posix_fallocate (Not a very common function) will allow a prebuilt java 11 to run if you set LD_PRELOAD in the environment to a trivial library created as follows (It should print a message if the function is called, but the jenkins agent does not appear to trigger it:

cat > fallocate.c << EOT
#include <stdio.h>
int posix_fallocate(int fd, off_t offset, off_t len)
{
  fprintf(stderr, "posix_fallocate() called but stubbed out\n");
}
EOT
cc -G -m64 -o fallocate.so -Kpic fallocate.c
$ LD_PRELOAD=$PWD/fallocate.so /usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java -version
openjdk version "11.0.17" 2022-10-18 LTS
OpenJDK Runtime Environment Zulu11.60+19-CA (build 11.0.17+8-LTS)
OpenJDK 64-Bit Server VM Zulu11.60+19-CA (build 11.0.17+8-LTS, mixed mode)
bash-3.2$ 

A couple of extra adjustments were needed to make git work properly as it was, by default, getting the LD_PRELOAD setting from the java process. I created a small wrapper around git to squash LD_PRELOAD (since git was a 32-bit application and was trying to use the symbol) I created this:

mkdir -p /usr/local/sxabin
cat > /usr/local/sxabin/git
#!/bin/sh
/usr/local/bin/git "$@"

And set the jenkins configuration for the agent to point to Tool locations -> git to /usr/local/sxabin/git and als set /usr/local/sxabin/git first in the PATH. This appears to have worked: https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin/201/console

sxa avatar Dec 05 '22 23:12 sxa

This looks like a good solution to me, the only real impact is on the jenkins agent, its probably a cleaner (and less prone to errors) than running via ssh forwarding or similar.

steelhead31 avatar Dec 06 '22 09:12 steelhead31

Unfortunately I hadn't noticed that the linked job ran on the old Solaris machine (I thought it had been disabled) so this isn't fully working yet as the LD_PRELOAD is going to sub-processes and since most of them are 32-bit the library causes a failure:

ld.so.1: sh: fatal: /usr/lib/jvm/fallocate-preload.so: wrong ELF class: ELFCLASS64
Killed

sxa avatar Dec 06 '22 10:12 sxa

Considering not persuing this and going down another route - build JDK11 on a Solaris 11 system but adjust the code so it doesn't require posix_fallocate

sxa avatar Dec 06 '22 10:12 sxa

@ptribble's 11.0.2 works on Solaris 10/x64: https://pkgs.tribblix.org/openjdk/openjdk11.0.2-s10-x86_64.tar.bz2 sha256sum a5484bd35ed15ea7dc97870cea470aedf0c713ecca8075e57954a70e8b32cd89

sxa avatar Dec 08 '22 18:12 sxa

Using LD_PRELOAD_64 rather than a bare LD_PRELOAD ought to restrict it to 64-bit processes, if you want to pursue that route.

ptribble avatar Dec 08 '22 19:12 ptribble

Using LD_PRELOAD_64 rather than a bare LD_PRELOAD ought to restrict it to 64-bit processes, if you want to pursue that route.

Thank you! I don't think I've used that before but it's EXACTLY what I needed and https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-solaris-x64-temurin/208/ has successfully built with the job running via a JDK11u jenkins agent on the machine.

So, for reference:

  • I built the shared library from the earlier comment on the machines and put it in /usr/lib/jvm
  • Extracted the X64 or SPARC version of Zulu under /usr/lib/jvm
  • Adjust the jenkins agent startup to have a JavaPath of /usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java and Prefix Start Agent Command of export LD_PRELOAD_64=/usr/lib/jvm/fallocate.so &&. Or if using a startup script use LD_PRELOAD_64=/usr/lib/jvm/fallocate-preload.so /usr/lib/jvm/zulu11.60.19-ca-jdk11.0.17-solaris/bin/java [...]

There were a few messages showing in the startup log on SPARC, but that appears to be building ok too.

FYI @speakjava - we might have a solution :-)

sxa avatar Dec 09 '22 11:12 sxa

We still have over 50 systems running JDK8 as the agent. Thanks to @steelhead31 for collating this list. We can check these off as they are fixed. Also need to ensure that the playbooks deploy new ones with JDK11/17 available (and ideally as the default)

Windows (14)

  • [ ] build-alibaba-win2012r2-x64-1
  • [x] build-alibaba-win2012r2-x64-2
  • [x] build-azure-win2012r2-x64-1
  • [x] build-azure-win2012r2-x64-2
  • [ ] build-azure-win2012r2-x64-4-sxa
  • [x] build-azure-win2016-x64-1
  • [x] build-ibmcloud-win2012r2-x64-1
  • [x] build-ibmcloud-win2012r2-x64-2
  • [x] test-azure-win2012r2-x64-1
  • [x] test-azure-win2012r2-x64-3
  • [x] test-azure-win2016-x64-1
  • [x] test-azure-win2019-x64-1
  • [x] test-ibmcloud-win2012r2-x64-1
  • [x] test-ibmcloud-win2012r2-x64-2

Linux build + test (22)

  • [x] build-digitalocean-centos69-x64-2
  • [x] build-osuosl-centos74-ppc64le-1X
  • [x] build-osuosl-centos74-ppc64le-2
  • [x] docker-osuosl-ubuntu2004-ppc64le-1
  • [x] test-aws-rhel76-armv8-1
  • [x] ~test-aws-rhel8-x64-1~
  • [x] ~test-docker-alpine316-aarch64-1 - Removed From Jenkins
  • [x] ~test-docker-ubuntu1804-armv8l-2 - Removed From Jenkins
  • [x] test-docker-ubuntu1804-armv8l-4
  • [x] test-docker-ubuntu2110-armv8l-1
  • [x] test-docker-ubuntu2204-armv8l-2
  • [x] test-ibmcloud-rhel7-x64-1
  • [x] test-ibmcloud-ubuntu1604-x64-1
  • [x] test-osuosl-centos74-ppc64le-1
  • [x] test-osuosl-centos74-ppc64le-2
  • [x] test-osuosl-ubuntu1604-ppc64le-1
  • [x] test-osuosl-ubuntu1604-ppc64le-2
  • [x] test-osuosl-ubuntu1804-ppc64le-1
  • [x] test-osuosl-ubuntu1804-ppc64le-2
  • [x] test-osuosl-ubuntu2004-ppc64le-1
  • [x] test-scaleway-ubuntu1604-x64-1
  • [x] test-skytap-ubuntu2004-ppc64le-1

Others (16)

  • [x] build-osuosl-aix71-ppc64-1
  • [x] build-osuosl-aix71-ppc64-2
  • [ ] ~build-packet_esxi-solaris10u11-x64-1~
  • [x] build-siteox-solaris10u11-sparcv9-1
  • [x] infra-ibmcloud-vagrant-x64-1.1
  • [x] infra-ibmcloud-vagrant-x64-1.2
  • [x] infra-ibmcloud-vagrant-x64-1.3
  • [x] infra-ibmcloud-vagrant-x64-1.4
  • [x] infra-ibmcloud-vagrant-x64-1.5
  • [x] test-osuosl-aix715-ppc64-1 p9-aix1-adopt05.osuosl.org
  • [x] test-osuosl-aix715-ppc64-2 adopt06
  • [x] test-osuosl-aix715-ppc64-3 adopt07
  • [x] test-osuosl-aix715-ppc64-4 adopt08
  • [ ] test-osuosl-aix72-ppc64-1 adopt03
  • [x] test-osuosl-aix72-ppc64-2 adopt04
  • [x] test-siteox-solaris10u11-sparcv9-1

Note that from the comments in https://github.com/adoptium/infrastructure/issues/2847 the Solaris 10 machines will require Bellsoft Liberica 11 to be used (Azul's seems to result inhigh CPU load after a while on Solaris 10) and requires the fallocate preload mentioned in an earlier comment.

sxa avatar Jan 11 '23 12:01 sxa

@steelhead31 @Haroon-Khel I've just adjusted the last comment to categorise the machines (Windows/Linux build+test/Other). I reckon at this point we should set an hour aside and do this. Would you be interested in doing it with an open call to discuss any issues? FYI @karianna in case you have someone who has access and would like to take on the windows subset :-)

@steelhead31 Can you easily tell how many of the ones already migrated are using a jenkins config that points to a specific java as opposed to the default version? (I'm not sure if the command lines you were getting would say just java in the latter case)

sxa avatar Feb 02 '23 12:02 sxa

@sxa / @Haroon-Khel  sounds like a good idea to me, I'll produce an up to date ( and complete list ) of the current state, this afternoon in preperation for doing this....  Im not sure how easy it is to tell default java from specified, as it just shows a path.. lets see if the updated list offers any insight

steelhead31 avatar Feb 02 '23 12:02 steelhead31

I'm somewhat in two minds about whether to change the default or override in jenkins (and if we override we need to bear in mind me raising #2912 recently!) But I think an override for the remaining ones, then possibly look at adjusting the defaults and resetting the override after 2912 goes in is probably my preferred approach ...

sxa avatar Feb 02 '23 12:02 sxa

+1 from me on overriding. From what ive seen, the default is usually 8, so unless you guys have an objection I see no reason as to why we shouldnt override it with 17 (on platforms that have 17)

Haroon-Khel avatar Feb 02 '23 13:02 Haroon-Khel

@steelhead31 I guess the table from https://github.com/adoptium/infrastructure/issues/2879#issuecomment-1408677930 would suggest that the ones with an "empty" Java version column are using the default on the system.

sxa avatar Feb 02 '23 13:02 sxa

New audit sheet is available here..
https://docs.google.com/spreadsheets/d/1MS4IcUSyJxOiKeQ14HN-uajstIv8hGM7Xz9hXFTmfnk/edit?usp=sharing

steelhead31 avatar Feb 02 '23 13:02 steelhead31

@steelhead31 I guess the table from #2879 (comment) would suggest that the ones with an "empty" Java version column are using the default on the system.

I believe so for the online ones at least... they will be using the system default, or they may have been connected from the node itself with a different JDK from the default... I'll audit a few of them, and see if its obvious to discern those two cases.

steelhead31 avatar Feb 02 '23 13:02 steelhead31

Looks like there's plenty of precedent and no objections to tweaking the jenkins agent config so I suggest we go for that, using /usr/lib/jvm/jdk-17 where available and a suitable platform-specific alternative elsewhere.

sxa avatar Feb 02 '23 13:02 sxa

@sxa I'm guessing build-azure-win2012r2-x64-4-sxa is on your home network? I cant get into it

Since I deleted the jenkins service on build-alibaba-win2012r2-x64-1 and I cant get it back up, for now I have the jenkins agent running in a background process with java17.

Steps to change the java path for the jenkins service:

  • In the jenkins user folder (C:\Users\jenkins usually) there should be a jenkins-slave.xml file
  • Open it with an editor and change the executable path to your desired java path
  • Save and close the file
  • Restart the jenkins service in Services
  • Check the System Information page for the jenkins node to see if the update has taken place

Haroon-Khel avatar Feb 03 '23 17:02 Haroon-Khel

@sxa I'm guessing build-azure-win2012r2-x64-4-sxa is on your home network? I cant get into it

No - the only ones hosted be me are the test-sxa ones. I'll check that definition as I think it's one of a set that I used when working with Andrew on reproducible builds. It probably just needs to be deleted.

sxa avatar Feb 03 '23 18:02 sxa

Actions arising from today's activities:

release
You are trying to get resource http://47.111.84.87:8080/jnlpJars/remoting.jar but it is not in cache and could not be downloaded. Attempting to continue, but you may expect failure
JAR http://47.111.84.87:8080/jnlpJars/remoting.jar not found. Continuing.
JAR http://47.111.84.87:8080/jnlpJars/remoting.jar not found. Continuing.
netx: Initialization Error: Could not initialize application. (Fatal: Initialization Error: Unknown Main-Class. Could not determine the main class for this application.)

Follow-on actions: Change all the definitions once we implement #2912 ;-)

sxa avatar Feb 03 '23 18:02 sxa

Based on the new column from the plugin in #2950 there are five machines still running JDK8 for the agent:

sxa avatar Feb 23 '23 13:02 sxa

The final machines have all been upgraded to JDK17 for jenkins agents.

steelhead31 avatar Mar 01 '23 12:03 steelhead31