infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Unexpected tar error while un tarring jdk17 binary in ppc64le and arm32 ubuntu 2404 docker image

Open Haroon-Khel opened this issue 1 year ago • 6 comments

ref https://github.com/adoptium/infrastructure/issues/3501#issuecomment-2091101160

Hitting a tar error while building arm32 and ppc64le ubuntu 24.04 docker static containers

 > [ 7/25] RUN mkdir -p /usr/lib/jvm/jdk17 && tar -xpzf /tmp/jdk17.tar.gz -C /usr/lib/jvm/jdk17 --strip-components=1:                                                                                               
0.295 tar: conf/security/policy/unlimited: Cannot change mode to rwxr-xr-x: Operation not permitted                                                                                                                 
0.295 tar: conf/security/policy/limited: Cannot change mode to rwxr-xr-x: Operation not permitted                                                                                                                   
0.295 tar: conf/security/policy: Cannot change mode to rwxr-xr-x: Operation not permitted                                                                                                                           
0.295 tar: conf/security: Cannot change mode to rwxr-xr-x: Operation not permitted                                                                                                                                  
0.295 tar: conf/sdp: Cannot change mode to rwxr-xr-x: Operation not permitted
0.296 tar: conf/management: Cannot change mode to rwxr-xr-x: Operation not permitted
0.296 tar: conf: Cannot change mode to rwxr-xr-x: Operation not permitted
0.305 tar: legal/java.base: Cannot change mode to rwxr-xr-x: Operation not permitted
1.052 tar: jmods: Cannot change mode to rwxr-xr-x: Operation not permitted

Haroon-Khel avatar May 02 '24 17:05 Haroon-Khel

The binaries untar without error on my local machine

Haroon-Khel avatar May 02 '24 17:05 Haroon-Khel

Interesting ... yeah I can replicate that on one of my arm32 systems. That's really odd ... It's not specific to our tar file, but seems to be affecting directories extracted by tar. Sounds like a bug in the new ubuntu unless it's related to the kernel on the host. Like you I couldn't replicate with an aarch64 container with either Ubuntu 20.04 or 22.04 as the host machine. tar is at the latest version. As an interim measure I would propose doing chmod -R a+rX /usr/lib/jvm/jdk-17-* afterwards which seems to work without problems but I'm nervous about whether this means we'll see issues elsewhere in our testing ...

sxa avatar May 03 '24 10:05 sxa

Thought I'd already added this comment (Edit: yes I did but at https://github.com/adoptium/infrastructure/issues/3501#issuecomment-2093759682) but running an emulated ppc64le container on another 24.04 host system did not show a problem, which works suggest there isn't a fundamentally problem with the base container and it potentially is related to the kernel being used

sxa avatar May 07 '24 09:05 sxa

This is a situation where having QPC updated with latest images would help.

sxa avatar May 07 '24 09:05 sxa

JDK11 Special.openjdk, Extended.system, and Special.functional all have appear to have a similar-looking issue unpacking the build with a tar command:

22:43:12 Uncompressing file: OpenJDK11U-jdk_ppc64le_linux_hotspot_11.0.24_7-ea.tar.gz ... 22:43:14 tar: jdk-11.0.24+7/man/ja: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.jartool/LICENSE: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.jartool/ADDITIONAL_LICENSE_INFO: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.jartool/ASSEMBLY_EXCEPTION: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.internal.jvmstat/LICENSE: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.internal.jvmstat/ADDITIONAL_LICENSE_INFO: Cannot change mode to rwxrwxr-x: Operation not permitted 22:43:20 tar: jdk-11.0.24+7/legal/jdk.internal.jvmstat/ASSEMBLY_EXCEPTION: Cannot change mode to rwxrwxr-x: Operation not permitted etc etc

All three happen on a 24.04 docker host.

adamfarley avatar Jul 08 '24 13:07 adamfarley

At the moment the 2 problem machines are test-docker-ubuntu2404-armv7-2 and test-docker-ubuntu2404-ppc64le-1. Another ubuntu2404 arm32 node is test-docker-ubuntu2404-armv7-1 and this problem does not occur on this machine. Same tar versions

jenkins@299a170b9f8f:~$ tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

Haroon-Khel avatar Jul 31 '24 14:07 Haroon-Khel

This happened again on test-docker-ubuntu2404-armv7-2 on 2024/11/20 https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-arm-temurin_SmokeTests/259/console

adamfarley avatar Nov 25 '24 15:11 adamfarley

Interesting - that's an Ubuntu 24 container on an Ubuntu 22 host.

sxa avatar Nov 25 '24 18:11 sxa

I can replicate this. All of the offending files appear to be symbolic links to files in legal/java.base. It only causes a problem with tar - Running chmod on the files afterwards is ok.

jenkins@dockerhost-equinix-ubuntu2204-armv8-1:~$ docker run -it aqa_u2404_arm32 bash
root@33aee01c5a84:/# wget https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jdk_arm_linux_hotspot_17.0.13_11.tar.gz
root@33aee01c5a84:/# tar xfz OpenJDK17U-jdk_arm_linux_hotspot_17.0.13_11.tar.gz
agent host result
test-docker-ubuntu2004-armv7l-3 dockerhost-equinix-ubuntu2404-armv8-1 link
test-docker-ubuntu2404-armv7-2 dockerhost-equinix-ubuntu2204-armv8-1 link
test-docker-ubuntu2004-armv7l-6 dockerhost-equinix-ubuntu2204-armv8-1 link

https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-linux-arm-temurin_SmokeTests/265/

Based on the above this is almost certainly due to running a later version of Ubuntu (which has glibc/kernel interdependencies that are too new) on an older kernel. The other possibility (which I shall aim to check tomorrow) is whether restarting docker resolves it (potentially if it has been upgraded) and also whether there are any pending docker package updates on the host that we might apply which might affect this.

sxa avatar Nov 25 '24 18:11 sxa

Restarting docker made no difference. Restarting the machine made no difference. An aarch64 Ubuntu 24.04 container on the host works ok, so this is specific to arm32 Ubuntu 24.04 containers on Ubuntu 22.04 host.

Solution here is to deactivate test-docker-ubuntu2404-armv7l-2 which I've done.

sxa avatar Nov 26 '24 11:11 sxa

FYI @Haroon-Khel we probably want to just decommission this particular machine now.

sxa avatar Nov 29 '24 14:11 sxa

@Haroon-Khel If you're happy with the analysis above can you add this to your list for this iteration please?

sxa avatar Dec 16 '24 11:12 sxa

Did a bit more digging on this. Looks like its a bug in ubuntu2404 or docker which prevents the container from running the fchmodat2 system call https://github.com/docker/docker-ce-packaging/pull/1007#issuecomment-2064332262

If I run the ubuntu 2404 docker container with --security-opt seccomp=unconfined, I can untar files without error. I believe this option allows the container to unrestricted system calls so I am not sure this is good security wise.

https://github.com/ocaml/infrastructure/issues/121#issuecomment-2128856617 suggests Docker >= 25.0.3 and libseccomp2 >= 2.5.5 solves this, so its a matter of upgrading those packages on the problem dockerhosts

root@dockerhost-osuosl-ubuntu2404-ppc64le-1:~# docker version
Client:
 Version:           26.1.3
 API version:       1.43 (downgraded from 1.45)
 Go version:        go1.22.2
 Git commit:        26.1.3-0ubuntu1~24.04.1
 Built:             Mon Oct 14 14:29:26 2024
 OS/Arch:           linux/ppc64le
 Context:           default

Server:
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.22.2
  Git commit:       24.0.7-0ubuntu4.1
  Built:            Fri Aug  9 02:33:20 2024
  OS/Arch:          linux/ppc64le
  Experimental:     false
 containerd:
  Version:          1.7.24
  GitCommit:        
 runc:
  Version:          1.1.12-0ubuntu3.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

Haroon-Khel avatar Feb 10 '25 19:02 Haroon-Khel

To do:

  • Upgrade docker on both ppc64le dockerhosts to >= 25.0.3
  • Upgrade libseccomp2 on the ubuntu 2204 arm64 dockerhost and ubuntu 2004 ppc64le dockerhost to >= 2.5.5
    • I suspect ubuntu 2204 and 2004 wont allow this so may have to upgrade the OS to ubuntu 2404

Haroon-Khel avatar Feb 10 '25 19:02 Haroon-Khel

* Upgrade libseccomp2 on the ubuntu 2204 arm64 dockerhost to >= 2.5.5
  * I suspect ubuntu 2204 wont allow this so may have to upgrade the OS to ubuntu 2404

I would test with the latest available if it doesn't have 2.5.5 in the repositories. Ubuntu (and other LTS distribution providers) will often backport important patches so even if they're showing something earlier than 2.5.5 it may be ok.

sxa avatar Feb 10 '25 20:02 sxa

Upgraded docker on dockerhost-osuosl-ubuntu2404-ppc64le-1 to v27

Server: Docker Engine - Community
 Engine:
  Version:          v27.4.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.10
  Git commit:       c710b88
  Built:            Mon Dec 23 11:56:44 2024
  OS/Arch:          linux/ppc64le
  Experimental:     false

I was able to untar a jdk binary on a ubuntu 2404 container on it without the permissions error. Looks good

Haroon-Khel avatar Feb 11 '25 11:02 Haroon-Khel

On the arm64 dockerhosts, it looks like the tar error on arm32 ubuntu 2404 containers cleared itself up? Docker or libseccomp2 may have upgraded during an automated patch. I cant seem to recreate the tar error on a arm32 ubuntu2404 container on any of the arm64 docker nodes

Haroon-Khel avatar Feb 11 '25 12:02 Haroon-Khel

dockerhost-skytap-ubuntu2004-ppc64le-1 is the remaining problem machine. As per https://github.com/adoptium/infrastructure/issues/3588 I am going to upgrade it to Ubuntu 2404, so I will upgrade docker on the machine after the OS upgrade (if ill still need to)

Haroon-Khel avatar Feb 11 '25 15:02 Haroon-Khel

I am starting the dockerhost-skytap-ubuntu2004-ppc64le-1 OS upgrade right now

Haroon-Khel avatar Feb 17 '25 16:02 Haroon-Khel

The upgrade terminated midway due to a lack of diskspace, presumably on /

Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/ubu1604p8--vg-root   38G   28G  8.2G  77% /

Haroon-Khel avatar Feb 17 '25 17:02 Haroon-Khel

Reiterating https://github.com/adoptium/infrastructure/issues/3588#issuecomment-2665831738

Ive upgraded dockerhost-skytap-ubuntu2004-ppc64le-1 to ubuntu 22.04, disk space issues on /boot are preventing an upgrade to ubuntu 24.04 but I am comfortable keeping it on 22 since its still supported. Its ubuntu2404 container, which has long been offline due to the untarring issue, is now able to run grinders because Docker on the host system has been upgraded to v26 https://ci.adoptium.net/job/Grinder/12705/console

Haroon-Khel avatar Feb 18 '25 14:02 Haroon-Khel