Update Test pipeline code to support Azure Dynamic agents
We had previously done a prototype to support Azure dynamic agents. Since there is now an updated Azure Jenkins plugin, and a desire to also have a set of different containers to run on (mimicking what is available via static docker containers currently, see https://github.com/adoptium/infrastructure/tree/master/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/DockerStatic/Dockerfiles).
These can be put into https://github.com/orgs/adoptium/packages/container/package/test-containers and referenced in test pipeline code. See https://github.com/adoptium/aqa-tests/pull/5683 for initial change.
Next steps can be to ensure we have a diversity of containers upon which to run tests on.
Commenting so I get updates ;-)
[EDIT: Some earlier "fake news" compressed into this twisty below - all of the Grinders were actually still using the default ubuntu2204 image as `DOCKERIMAGE_TAG` doesn't do what I thought it did]
I have published new Ubuntu 24.0 and UBI10 images that can be pulled down from ghcr.io and have run some Grinders with the patches to use them. The Grinders were just running `jdk_math_0 and haven't been run with anything more complex than that for the purposes of testing. There was initially some quite long delays when provisioning but that seems to have stabilised a bit:
- Grinder14311 (ubuntu2204 image, machine
68efb0, ) - Grinder14312 (cent7build image, machine
68efb0, ) - Grinder14314 (cent7build image, machine
68efb0, ) - Grinder14319 (ubuntu2204 image, machine
68efb0) - Grinder14320 (ubuntu2404 image, machine
68efb0) - Grinder14321 (ubi10 image, machine
68efb0) - Grinder14322 (ubuntu2404 image, machine
68efb0)
Noting that for the first two there was a bit of a delay as one of the machines didn't seem to provision properly, and so the machine 68efb0 ended up being used instead of the dc7f70 that was probably in place before and the job spent about 16 minutes in the queue state waiting for a system to become available
11:46:23 ‘[test-linux-x64-dc7f70](https://ci.adoptium.net/computer/test%2Dlinux%2Dx64%2Ddc7f70/)’ is offline
12:04:36 Running on [test-linux-x64-68efb0](https://ci.adoptium.net/computer/test%2Dlinux%2Dx64%2D68efb0/) in /home/adoptopenjdk/workspace/Grinder
Note: I had to adjust the existing test_linux_x64 to change the Network Security Group Name from the default (nothing selected) to jenkins-dynamic-NSG otherwise the provisioning was failing with the previous setup.
The cloud configuration for Azure is at https://ci.adoptium.net/manage/cloud/Azure/
We have two Azure/Linux dynamic machine definitions in jenkins at the moment, defined in the agent templates panel and the currenty ones are visible in the Build executor status panel on the first of those links (links only accessible to jenkins admins)
build-linux-x64used by the build machines and will provision aD4s_v3system with Ubuntu 22.04 and a 60 minute retention time and labelsbuild x86_64 x64 linux docker alpine-linux dockerBuild dynamicAzureand an agent workspace of/home/adoptopenjdktest-linux-x64used by the test jobs that we're enabling here and will provision aD2s_v3system, with Ubuntu 22.04 and a retention time of zero with labelshw.arch.x86 sw.os.linux ubuntu ubuntu2204 x64 x86-64 ci.agent.dynamicand an agent workspace of/home/adoptopenjdk
Usage
The above grinders were kicked off with the following parameters (ubi10 is shown in this example but ubuntu2204 and ubuntu2404 also work, as do the URL of a build image such as adoptium/centos7_build_image or ghcr.io/adoptium/adoptium_build_image:centos7). The ubuntu2404 and ubi10 ones are currently being created by the [test_image_updater] (https://ci.adoptium.net/job/test_image_updater) job for x64 only at present):
ADOPTOPENJDK_REPO:https://github.com/sxa/aqa-tests.gitADOPTOPENJDK_BRANCH:azure_dynamic_containersDOCKERIMAGE_TAG:ghcr.io/adoptium/test_containers:ubi10CLOUD_PROVIDER:azure
FYI @AdamBrousseau in case you want to try out these test container images (based on the corresponding DockerStatic docker files for each distribution)
Also FYI @Haroon-Khel who has done a lot of the maintenance for provisioning machines with these dockerfiles https://
Noting that this change in the aqa-tests branch defaults the DOCKERIMAGE_TAG to ghcr.io/adoptium/test-containers:ubuntu2204 if not specified (I'm tempted to switch that over to UBI10)
Slight update based on the fact that the container to use is hard coded: The azure_dynamic_containers branch described above is now using the ghcr.io/adoptium/test_containers:ubi10 image, and I've put in a change related to https://github.com/adoptium/aqa-tests/pull/6541/files which will automatically set XDG_RUNTIME_DIR to something within the workspace directory when running with weston (EL10+) which keeps things a bit tidier and works without having to have this hard coded in the agent configuration (/home/jenkins/.xdg-runtime on the adopt systems)
Note that these runs do not fall foul of https://github.com/adoptium/infrastructure/issues/4046 which is an Azure+IPv6 specific issue but is not seen on these dynamic envronments for some reason. This has been verified at https://ci.adoptium.net/job/Grinder/14352
Thanks @sxa - and yes to the dup out of DOCKERIMAGE_TAG being used for external test group only, not with dynamic agents, though we should be able to co-opt it for such a purpose.
Adding a reminder since this has progressed, that we should check with EF infra and Temurin compliance as to whether they could also benefit from this (as in configure azure plugin on the tc Jenkins server, for running sanity and extended jobs, possibly not suitable for dev and special jobs, but the remote trigger divides them anyway).
I will close this issue as complete, as we have enabled support for x64 Linux Azure dynamic agents.
Separate issues will be created to enable aarch64 Linux, x64 Windows, and any other variant where possible.