Support and use docker builds on Linux/s390x
Features that impact the whole project (e.g. Adding a new OpenJDK distribution)are made over at the adoptium.
Otherwise, please describe what enhancement you would like to see in the build scripts:
In order to improve isolation of build environments, I'd like to try moving the s390x builds into docker containers. We now have a good range of machines that would be suitable for this as part of the work done in https://github.com/adoptium/infrastructure/issues/2673 and therefore this is now more feasible. This will need:
- [ ] Verification whether we can acquire a suitable base image (May be a blocker for s390x)
- [ ] https://ci.adoptopenjdk.net/job/centos7_docker_image_updater/ adjusted to generate a suitable image
- [ ] Verify that the image can produce a suitable build (ideally verified with the reproducible build options
- [ ] Ensure that we have appropriate machine tags/capacity to perform the builds
- [ ] Adjust the pipelines to use these systems (e.g. changing pipeline configs)
Base image is currently problematic as UBI is not currently a practical alternative due to missing prereqs in the default repositories such as the X11 packages. CentOS is not available on s390x. It May be more feasible when https://github.com/adoptium/infrastructure/issues/2008 is finalised (allow us to run RHEL in the container) or use an alternate sysroot for the build.
Our ROSI subscription allows me to create a suitable image, however we would likely not be able to push a RHEL image up to dockerhub. A suitable dockerfile for creating the image is in https://github.com/adoptium/infrastructure/pull/2926 and requires a suitable Red Hat subscription credentials to be passed to it The initial failures with UBI7 as mentioned above are due to flex and some of the other X11 development prereqs such as:
- cups-devel
- elfutils-libelf-devel
- flex
- gmp-devel
- libXext-devel
- libXi-devel
- libXrandr-devel
- libXrender-devel
- libXt-devel
- libXtst-devel
- mesa-libGL-devel
- mpfr-devel
Currently the build process cannot use the image that is on the two build machines as it expects to be able to perform a docker pull on the image which is not possible in this case as it has not been pushed off the build machine. We may need to look at using a credential to pull from a private repository, or having a checkbox to tell the processes not to perform the pull step and use an image already on the machine.
@andrew-m-leonard - I may need some expert assistance on this. I've tried using my branch that should skip the pull at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/178/console with my own repository (based on your new wiki page saying what to edit!) and the docker image set to rhel7_build_image which exists on the build machines, but it doesn't appear to be skipping the pull - and there's no indication that my new values of my fork/branch from this PR have been honoured. Any advice?
@sxa Your pipeline has useAdoptBashScripts set to True, set this to False and it will generate the required USER_REMOTE_CONFIG correctly.
@sxa Your pipeline has useAdoptBashScripts set to True, set this to False and it will generate the required USER_REMOTE_CONFIG correctly.
Thanks - I thought I'd picked up on that and done a re-run but apparently not 👍🏻
https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/181/console is the job that shows the problem I was describing where it seems to be skipping the pull despite (I think) being run with the correct options to pull in my rhel_nopull branch, but I'm tired today so if I'm still missing something obvious LMK.
https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/181/console is the job that shows the problem I was describing where it seems to be skipping the pull despite (I think) being run with the correct options to pull in my
rhel_nopullbranch, but I'm tired today so if I'm still missing something obvious LMK.
It's still using AdoptBuildScripts=true, i've kicked off a new pipeline: https://ci.adoptium.net/job/build-scripts/job/openjdk17-pipeline/635/
Looks good:
12:40:39 [CHECKOUT] Checking out User Pipelines https://github.com/sxa/ci-jenkins-pipelines.git : rhel_nopull
[Pipeline] checkout
Thanks - I'm getting true and false the the wrong way round because I'm too tired to read properly apparently :-)
I'm re-running the build job at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/183/console with the docker image option to test the functionality I'm looking for.
@andrew-m-leonard That job seems to have failed - could you take a look and see if there's an obvious reason for it as the error message is somewhat opaque.
@sxa yeah, particularly opaque! I've started a new one to see if repeated: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/186/
@sxa Looks like something is hanging in the container, can you add some debug? or can you try running it manually and see what is hanging?
@sxa Looks like something is hanging in the container, can you add some debug? or can you try running it manually and see what is hanging?
I previously tested this on the machines with the container without problems - it's only started being a problem when run from jenkins.
Log from doing it again manually:
[jenkins@build-marist-rhel79-s390x-1 ~]$ docker run -it rhel7_build_image bash
[root@d5daa60ed90c /]# id jenkins
uid=1000(jenkins) gid=1000(jenkins) groups=1000(jenkins)
[root@d5daa60ed90c /]# su - ^C
[root@d5daa60ed90c /]# exit
[jenkins@build-marist-rhel79-s390x-1 ~]$ docker run -it rhel7_build_image bash
s[root@421d903ac358 /]# su - jenkins
[jenkins@421d903ac358 ~]$ git clone https://github.com/adoptium/temurin-build
Cloning into 'temurin-build'...
remote: Enumerating objects: 12483, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 12483 (delta 8), reused 14 (delta 4), pack-reused 12458
Receiving objects: 100% (12483/12483), 5.08 MiB | 25.02 MiB/s, done.
Resolving deltas: 100% (8250/8250), done.
[jenkins@421d903ac358 ~]$ cd temurin-build
[jenkins@421d903ac358 temurin-build]$ cd build-farm/
[jenkins@421d903ac358 build-farm]$ ./make-adopt-build-farm.sh
ARCHITECTURE not defined - assuming s390x
TARGET_OS not defined - assuming you want Linux
JAVA_TO_BUILD not defined - defaulting to jdk11u
VARIANT not defined - assuming hotspot
FILENAME not defined - assuming jdk11u-hotspot.tar.gz
[...]
Hmmm may be related to https://github.com/adoptium/infrastructure/issues/2834 which was an issue with the UID of jenkins in the container doesn't match that of the host.
16:28:06 process apparently never started in /home/jenkins/workspace/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-s390x-temurin@tmp/durable-3dab79cc
OK yeah this is a UID mismatch issue between the host and docker image - @Haroon-Khel do we have an issue to look at the options for resolving this permanently?
I've run it with a patched docker image at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-s390x-temurin/188/console and it seems to be running through as expected.