warewulf icon indicating copy to clipboard operation
warewulf copied to clipboard

Container import from directory with sockets fails

Open BeHom opened this issue 2 years ago • 14 comments

Version of Warewulf

What version of Warewulf are you using? Run

wwctl version:   4.4.1-1.git_d6f6fed
rpc version: apiPrefix:"rc1" apiVersion:"1" warewulfVersion:"4.4.1-1.git_d6f6fed"

Expected behavior

During the development process of a container definition file, I need to import, delete and re-import a container from the Apptainer build process. While it works one time it failed the 2nd time. The expected behavior is that the import should be possible at any time .

Actual behavior

Sequence of work

  • release the container under test from the actual configuration.
    wwctl profile set --yes --container rocky-8 "default"
  • build the new container sandbox based on the modified definition file apptainer build --sandbox /tmp/rocky-8-def ./rocky-8-def.def
  • delete the old container wwctl container delete rocky-def
  • import the new container from sandbox (here the problem will occur the 2nd time). wwctl container import /tmp/rocky-8-def rocky-def
  • activate the new container for some nodes wwctl profile set --yes --container rocky-def "default"
  • Test the running node

During the import step, the following error occurred. wwctl container import /tmp/rocky-8-def rocky-def ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

No matter if I restarted warewulf or reconfigured warewulf, the problem remains. Only a reboot of the warewulf master server will fix it.

It looks like artifacts from the old container I deleted are preventing a new import.

A test with a new container and then an import was succesfull possible in the error situation. Only the import into a previously existing container name, in my case rocky-def, is not possible.

Steps to reproduce this behavior

See above.

How can others reproduce this issue/problem?

What OS/distro are you running

$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

How did you install Warewulf

dnf install ./warewulf-4.4.1-1.rpm

BeHom avatar Aug 15 '23 11:08 BeHom

The following associated command also shows errors: The container exists before and is lost afterwards.

wwctl container import --force /tmp/rocky-8-def rocky-def Overwriting existing VNFS Updating the container's /etc/resolv.conf ERROR : Could not create destination file /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: no such file or directory WARN : Could not copy /etc/resolv.conf into container: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: no such file or directory ERROR : error in user sync, fix error and run 'syncuser' manually: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/passwd: no such file or directory Building container: rocky-def ERROR : could not build container rocky-def: Container does not exist: rocky-def ERROR: could not build container rocky-def: Container does not exist: rocky-def

BeHom avatar Aug 16 '23 13:08 BeHom

@BeHom thanks for reporting this. This is a known issue, and one I'm hoping to get fixed soon.

In my experience, if you try to import --force twice it works the second time, as a work-around.

anderbubble avatar Aug 16 '23 16:08 anderbubble

Thanks @anderbubble for getting back. The workaround will not fix it. wctl container import --force /tmp/rocky-8-def rocky-def Overwriting existing VNFS Updating the container's /etc/resolv.conf ERROR : Could not create destination file /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: no such file or directory WARN : Could not copy /etc/resolv.conf into container: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/resolv.conf: no such file or directory ERROR : error in user sync, fix error and run 'syncuser' manually: open /var/lib/warewulf/chroots/rocky-def/rootfs/etc/passwd: no such file or directory Building container: rocky-def ERROR : could not build container rocky-def: Container does not exist: rocky-def ERROR: could not build container rocky-def: Container does not exist: rocky-def

[root@martin tmp]# wwctl container import --force /tmp/rocky-8-def rocky-def ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory [root@martin tmp]# wwctl container import --force /tmp/rocky-8-def rocky-def ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-def/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

Import under new name also gets stuck. root@martin tmp]# wwctl container import --force /tmp/rocky-8-def rocky-karl ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-karl/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-karl/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

Import of new build container sandbox also failed. INFO: Build complete: /tmp/rocky-8-def2 [root@martin tmp]# wwctl container import /tmp/rocky-8-def2 rocky-karl ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-karl/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-karl/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

Container import updates will not update the chroot of the container. wctl container import --update /tmp/rocky-8-def rocky-def will not make any update. usage unclear

BeHom avatar Aug 17 '23 08:08 BeHom

this PR should help fix this issue https://github.com/hpcng/warewulf/pull/1015

JasonYangShadow avatar Dec 15 '23 05:12 JasonYangShadow

@anderbubble

test against the main branch it looks like this issue gets fixed

[vagrant@localhost test]$ apptainer build --sandbox ./rockylinux-8/ docker://ghcr.io/hpcng/warewulf-rockylinux:8
INFO:    Starting build...
Getting image source signatures
Copying blob d4cdbc20b5d6 done  
Copying blob 8b7880b32c88 done  
Copying blob a49f4b3e1553 done  
Copying blob 49a072db3168 done  
Copying blob c43766916271 done  
Copying config de197a6d39 done  
Writing manifest to image destination
Storing signatures
2024/01/30 22:16:41  info unpack layer: sha256:a49f4b3e1553c4468c366b42fd1cde2a27729bd7ab13162ad061af2bd1ef9268
2024/01/30 22:16:43  info unpack layer: sha256:8b7880b32c88b97f7738d59c6d76a1f31624007c645be620a1c9720d766b6608
2024/01/30 22:16:47  warn rootless{usr/libexec/openssh/ssh-keysign} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
2024/01/30 22:16:47  info unpack layer: sha256:49a072db31682b9f8e8ee50c7bb6f55901d60af154d7f63bed68565f3321f1f5
2024/01/30 22:16:47  info unpack layer: sha256:c43766916271959ec4cc6da5a0455c2c4f1784a7e13081a80461737a20e470e5
2024/01/30 22:16:47  info unpack layer: sha256:d4cdbc20b5d6a3211177be994a6946e43920c8fa826ea0fca0707b23c6179ddc
WARNING: The sandbox contain files/dirs that cannot be removed with 'rm'.
WARNING: Use 'chmod -R u+rwX' to set permissions that allow removal.
WARNING: Use the '--fix-perms' option to 'apptainer build' to modify permissions at build time.
INFO:    Creating sandbox directory...
INFO:    Build complete: ./rockylinux-8/
[vagrant@localhost test]$ sudo su
[root@localhost test]# wwctl container import ./rockylinux-8/ rockylinux-8
uid/gid not synced: run `wwctl container syncuser --write rockylinux-8`
[root@localhost test]# wwctl container syncuser --write rockylinux-8
uid/gid synced for container rockylinux-8
[root@localhost test]# wwctl container list
  CONTAINER NAME  NODES  KERNEL VERSION                 CREATION TIME        MODIFICATION TIME    SIZE     
  rockylinux-8    0      4.18.0-513.9.1.el8_9.aarch64   30 Jan 24 22:18 MST  31 Dec 69 17:00 MST  1.2 GiB  
  rockylinux-8.7  0      4.18.0-425.19.2.el8_7.aarch64  30 Jan 24 21:59 MST  31 Dec 69 17:00 MST  1.1 GiB  
[root@localhost test]# apptainer version
1.2.5-1.el9
[root@localhost test]# cat /etc/os-release 
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
[root@localhost test]# wwctl container import --force ./rockylinux-8/ rockylinux-8
Overwriting existing VNFS
uid/gid not synced: run `wwctl container syncuser --write rockylinux-8`
[root@localhost test]# wwctl container syncuser --write rockylinux-8
uid/gid synced for container rockylinux-8
[root@localhost test]# wwctl container list
  CONTAINER NAME  NODES  KERNEL VERSION                 CREATION TIME        MODIFICATION TIME    SIZE     
  rockylinux-8    0      4.18.0-513.9.1.el8_9.aarch64   30 Jan 24 22:19 MST  31 Dec 69 17:00 MST  1.2 GiB  
  rockylinux-8.7  0      4.18.0-425.19.2.el8_7.aarch64  30 Jan 24 21:59 MST  31 Dec 69 17:00 MST  1.1 GiB  
[root@localhost test]# wwctl version
wwctl version:   4.5.x-1.git_8b7586d2
rpc version: apiPrefix:"rc1" apiVersion:"1" warewulfVersion:"4.5.x-1.git_8b7586d2"

JasonYangShadow avatar Jan 31 '24 05:01 JasonYangShadow

Thanks for the verification, @JasonYangShadow!

anderbubble avatar Feb 06 '24 05:02 anderbubble

Sorry have to reopen the issue. The problem is still there. Current version: wwctl version wwctl version: 4.5.x-1 rpc version: apiPrefix:"rc1" apiVersion:"1" warewulfVersion:"4.5.x-1"

I just used “dnf install” for the packages (kept old Warewulf setup).

Build a container based on definition file: INFO: Adding labels INFO: Creating sandbox directory... INFO: Build complete: /tmp/rocky-8-base-container Mon Feb 12 13:00:46 CET 2024

Try to import the container:

wwctl container import /tmp/rocky-8-base-container rocky-8-def-05 ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-8-def-05/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-8-def-05/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

After a restart, at least the import is possible.

BeHom avatar Feb 12 '24 13:02 BeHom

@BeHom thanks for the report. We'll investigate.

anderbubble avatar Feb 12 '24 18:02 anderbubble

@BeHom I think I see now that I was misunderstanding your report. We had previously also had a general issue with force-importing; but it appears to be working in a simple case now:

[janderson@rocky main]$ sudo wwctl container import /var/lib/warewulf/chroots/alpine/rootfs alpine2
WARN   : id(14) collision: host(ftp) container(postmaster)
WARN   : add postmaster to host to resolve conflict
ERROR  : error in user sync, fix error and run 'syncuser' manually: id(14) collision: host(ftp) container(postmaster)
ERROR: error in user sync, fix error and run 'syncuser' manually: id(14) collision: host(ftp) container(postmaster)

[janderson@rocky main]$ sudo wwctl container import /var/lib/warewulf/chroots/alpine/rootfs alpine2
ERROR  : VNFS Name exists, specify --force, --update, or choose a different name: alpine2
ERROR: VNFS Name exists, specify --force, --update, or choose a different name: alpine2

[janderson@rocky main]$ sudo wwctl container import --force /var/lib/warewulf/chroots/alpine/rootfs alpine2
Overwriting existing VNFS
WARN   : id(14) collision: host(ftp) container(postmaster)
WARN   : add postmaster to host to resolve conflict
ERROR  : error in user sync, fix error and run 'syncuser' manually: id(14) collision: host(ftp) container(postmaster)
ERROR: error in user sync, fix error and run 'syncuser' manually: id(14) collision: host(ftp) container(postmaster)

It also worked with a Rocky container.

[janderson@rocky main]$ sudo wwctl container import /var/lib/warewulf/chroots/rocky-8/rootfs/ rocky-8-reimport
uid/gid not synced: run `wwctl container syncuser --write rocky-8-reimport`
[janderson@rocky main]$ sudo wwctl container import /var/lib/warewulf/chroots/rocky-8/rootfs/ rocky-8-reimport
ERROR  : VNFS Name exists, specify --force, --update, or choose a different name: rocky-8-reimport
ERROR: VNFS Name exists, specify --force, --update, or choose a different name: rocky-8-reimport
[janderson@rocky main]$ sudo wwctl container import --force /var/lib/warewulf/chroots/rocky-8/rootfs/ rocky-8-reimport
Overwriting existing VNFS
uid/gid not synced: run `wwctl container syncuser --write rocky-8-reimport`

Can you share the specific .def you're using to create the sandbox?

anderbubble avatar Feb 17 '24 03:02 anderbubble

@anderbubble After your question about the def file, I delved a little deeper into the subject of container creation. If found the difference between container which can be imported and other which cannot be imported and need a system reboot. It looks like /run/user/0 is not cleaned up inside the container even after a successful build.

There are still running processes for the sandbox container after build E.g. ./rocky-8-NVIDIA-container/run/user/0/gnupg/d.mu5z3ywgt671eanykasyz8xb/S.gpg-agent ./rocky-8-NVIDIA-container/run/user/0/gnupg/d.mu5z3ywgt671eanykasyz8xb/S.gpg-agent.extra ./rocky-8-NVIDIA-container/run/user/0/gnupg/d.mu5z3ywgt671eanykasyz8xb/S.gpg-agent.browser ./rocky-8-NVIDIA-container/run/user/0/gnupg/d.mu5z3ywgt671eanykasyz8xb/S.gpg-agent.ssh … “ps -ef” on the host. root 23999 1 0 12:17 ? 00:00:00 gpg-agent --homedir /var/cache/dnf/vscode-8194d3505cd295f0/pubring --use-standard-socket –daemon root 23999 1 0 12:17 ? 00:00:00 gpg-agent --homedir /var/cache/dnf/vscode-8194d3505cd295f0/pubring --use-standard-socket --daemon The reasons are two repositories using , gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB gpgkey=https://packages.microsoft.com/keys/microsoft.asc

These processes will disappear after reboot and therefore release the /run/user/0 directories. That’s why a reboot helps for the import. A “pkill gpg-agent” will also help but is somewhat strange to me.

My though was that after apptainer build sandbox finished all related processes are stopped. I used the command “apptainer build --force --fix-perms –sandbox /tmp/” apptainer version 1.2.5-1.el8.

The error message from warewulf is somewhat misleading: wwctl container import /tmp/rocky-8-minimal rocky-8.8-t2 ERROR : could not import image: lchown /var/lib/warewulf/chroots/rocky-8.8-t2/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory ERROR: could not import image: lchown /var/lib/warewulf/chroots/rocky-8.8-t2/rootfs/run/user/0/gnupg/d.hkt3xk3ifea4rs471snc7nsb/S.gpg-agent: no such file or directory

Attach a definition file (did not check whether this one is bootable) showing the principal problem. #################################################################################### # Definition file for apptainer to build a bootable rocky container # the container can be imported as image for warewulf boots of nodes #################################################################################### BootStrap: docker From: rockylinux:8.8 %files /etc/yum.repos.d/Intel-One-API.repo /etc/yum.repos.d/vscode.repo

#################################################################### # basic installation process for a bootable image #################################################################### %post dnf update -y ;
dnf install -y --allowerasing coreutils
cpio
dbus
dhclient
e2fsprogs
ethtool
findutils
initscripts
ipmitool
iproute
kernel-core
libbpf
net-tools
network-scripts
nfs-utils
openssh-clients
openssh-server
pciutils
psmisc
rsyslog
which

########################################################################## # final setup ########################################################################## dnf clean all

sed -i -e '/^account.pam_unix.so\s$/s/\s*$/\ broken_shadow/' /etc/pam.d/system-auth sed -i -e '/^account.pam_unix.so\s$/s/\s*$/\ broken_shadow/' /etc/pam.d/password-auth

rm -f /etc/sysconfig/network-scripts/ifcfg-e*

systemctl unmask console-getty.service dev-hugepages.mount getty.target sys-fs-fuse-connections.mount systemd-logind.service systemd-remount-fs.service systemctl enable network

touch /etc/sysconfig/disable-deprecation-warnings

mkdir -p /etc/warewulf touch /etc/warewulf/excludes touch /etc/warewulf/container_exit.sh chmod +x /etc/warewulf/container_exit.sh

echo "#!/bin/sh" > /etc/warewulf/container_exit.sh echo "set -x" >> /etc/warewulf/container_exit.sh echo "LANG=C" >> /etc/warewulf/container_exit.sh echo "LC_CTYPE=C" >> /etc/warewulf/container_exit.sh echo "export LANG LC_CTYPE" >> /etc/warewulf/container_exit.sh echo "dnf clean all" >> /etc/warewulf/container_exit.sh echo "/boot/" > /etc/warewulf/excludes echo "/usr/share/GeoIP" >> /etc/warewulf/excludes

%labels Author Bernhard Version 0.1.01 Description Rocky 8 Warewulf Container definition for HPC Cluster

BeHom avatar Feb 18 '24 11:02 BeHom

Thanks for all the new info, @BeHom! I'll try to replicate.

anderbubble avatar Feb 19 '24 19:02 anderbubble

@BeHom there's a few different things happening here at the same time.

  • Apptainer, when building into a sandbox, is allowing sockets in the sandbox to persist.
  • Warewulf, when encountering a socket in the sandbox, is failing, because the socket doesn't exist at the target, so it can't copy permissions from the source socket to the dest.

Ultimately, I think this is a bug in github.com/containers/storage/drivers/copy.DirCopy, which is what we use to copy the container directory.

I tried updating to the latest version of github.com/containers/storage, but the behavior persists; so to really resolve this, we'd either need to move to a different library or submit a fix upstream.

For now, I suggest the following workaround to remove sockets from a sandbox before import:

$ find image.sandbox -xdev -type s -exec rm {} +

anderbubble avatar Mar 17 '24 22:03 anderbubble

Also more recently reported during wwctl container copy, when a previous wwctl container shell has left sockets in the chroot:

Copying sources...
ERROR  : could not duplicate image: lchown /var/lib/warewulf/chroots/gpuslurm/rootfs/run/user/0/gnupg/d.kg8ijih5tq41ixoeag4p1qup/S.gpg-agent: no such file or directory

anderbubble avatar Jul 15 '24 18:07 anderbubble