`bootc install to-existing-root` failure tracker
Boot openstack VM with package mode and run podman run --rm --tls-verify=false --privileged --pid=host quay.io/redhat_emp1/bootc-workflow-test:bhpq bootc install to-existing-root failed.
Error:
fatal: [guest]: FAILED! => changed=true
cmd:
- podman
- run
- --rm
- --tls-verify=false
- --privileged
- --pid=host
- quay.io/redhat_emp1/bootc-workflow-test:bhpq
- bootc
- install
- to-existing-root
delta: '0:01:07.093168'
end: '2025-01-23 03:00:04.342900'
msg: non-zero return code
rc: 1
start: '2025-01-23 02:58:57.249732'
stderr: |-
----------------------------
WARNING: This operation will OVERWRITE THE BOOTED HOST ROOT FILESYSTEM and is NOT REVERSIBLE.
Waiting 20s to continue; interrupt (Control-C) to cancel.
----------------------------
[31mERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Importing: Parsing layer blob sha256:017dc5c1ff3b66e4764e3e88f212c903ed7ef26a19454c358ac8717b077b63df: error: ostree-tar: Processing deferred hardlink var/cache/dnf/rhel-appstream-c101d4db5fbc3a4f/repodata/527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Failed to find object: No such file or directory: 527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Processing tar: Failed to commit tar: ExitStatus(unix_wait_status(256))
stderr_lines: <omitted>
stdout: |-
Installing image: docker://quay.io/redhat_emp1/bootc-workflow-test:bhpq
Digest: sha256:8fb3136d5706463daaeed7557614eb46cb860877d94f76fbe900a8dcafd333eb
Initializing ostree layout
layers already present: 0; layers needed: 73 (755.2 MB)
stdout_lines: <omitted>
Same test passed on AWS ec2 instance (both x86_64 and aarch64).
Can you link to a log file for this job with more information? Like, what are the versions of the host system, bootc, what's in the base image etc.?
var/cache/dnf/rhel-appstream-c101d4db5fbc3a4f
Looks like this image is missing a dnf clean all?
But still though, we should work here obviously...and I don't think this could really be platform-specific; it must have something to do with how the container image is built.
Is this reproducible? Can you push the quay.io/redhat_emp1/bootc-workflow-test:bhpq image somewhere persistent?
Yeah, I was working on this issue yesterday and tried on different platform to see what's different between those platforms. I think I need collect more information for debugging.
All those tests are running on the same machine (Fedora 41 VM) and test comes from https://gitlab.com/fedora/bootc/tests/bootc-workflow-test/-/blob/main/os-replace.sh?ref_type=heads script.
The base image is registry.stage.redhat.io/rhel10/rhel-bootc:10.0 and bootc version is 1.1.4. And registry.stage.redhat.io/rhel9/rhel-bootc:9.6 with bootc version is 1.1.4 has the same issue.
NOTE: quay.io/fedora/fedora-bootc:42 with bootc version is 1.1.4 does not have this issue on Azure
The test workflow is deploy RHEL 10 (package mode) VM -> run bootc install
- AWS: Passed
FROM registry.stage.redhat.io/rhel10/rhel-bootc:10.0
COPY rhel.repo /etc/yum.repos.d/rhel.repo
RUN dnf install -y rhc
RUN dnf -y install cloud-init && \
ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants && \
rm -rf /var/{cache,log} /var/lib/{dnf,rhsm}
COPY usr/ /usr/
RUN dnf -y clean all
COPY auth.json /etc/ostree/auth.json
RUN sed -i "s/name: cloud-user/name: ec2-user/g" /etc/cloud/cloud.cfg
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda3 xfs 20G 1.8G 19G 9% /
devtmpfs devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs tmpfs 1.8G 0 1.8G 0% /dev/shm
tmpfs tmpfs 731M 8.6M 722M 2% /run
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/systemd-journald.service
/dev/xvda2 vfat 200M 8.4M 192M 5% /boot/efi
tmpfs tmpfs 366M 4.0K 366M 1% /run/user/1000
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
changed: [guest] => changed=true
cmd:
- podman
- run
- --rm
- --tls-verify=false
- --privileged
- --pid=host
- quay.io/redhat_emp1/bootc-workflow-test:k71w
- bootc
- install
- to-existing-root
delta: '0:01:30.825373'
end: '2025-01-24 03:15:41.846088'
msg: ''
rc: 0
start: '2025-01-24 03:14:11.020715'
stderr: |-
----------------------------
WARNING: This operation will OVERWRITE THE BOOTED HOST ROOT FILESYSTEM and is NOT REVERSIBLE.
Waiting 20s to continue; interrupt (Control-C) to cancel.
----------------------------
stderr_lines: <omitted>
stdout: |-
Installing image: docker://quay.io/redhat_emp1/bootc-workflow-test:k71w
Digest: sha256:3a1132c05390a5a334c04b7353ec0f8135ca3a0824e320e3ab157de027871c32
Initializing ostree layout
layers already present: 0; layers needed: 74 (772.0 MB)
Deploying container image...done (14 seconds)
Running bootupctl to install bootloader
> bootupctl backend install --write-uuid --update-firmware --auto --device /dev/xvda /target
Installed: grub.cfg
Installation complete!
- Azure: Failed
FROM registry.stage.redhat.io/rhel10/rhel-bootc:10.0
COPY rhel.repo /etc/yum.repos.d/rhel.repo
RUN dnf install -y rhc
COPY etc/ /etc/
# install required packages and enable services
RUN dnf -y install \
WALinuxAgent \
cloud-init \
cloud-utils-growpart \
hyperv-daemons && \
dnf clean all && \
systemctl enable NetworkManager.service && \
systemctl enable waagent.service && \
systemctl enable cloud-init.service && \
echo 'ClientAliveInterval 180' >> /etc/ssh/sshd_config
# configure waagent for cloud-init to handle provisioning
RUN sed -i 's/Provisioning.Agent=auto/Provisioning.Agent=cloud-init/g' /etc/waagent.conf && \
sed -i 's/ResourceDisk.Format=y/ResourceDisk.Format=n/g' /etc/waagent.conf && \
sed -i 's/ResourceDisk.EnableSwap=y/ResourceDisk.EnableSwap=n/g' /etc/waagent.conf
RUN dnf -y clean all
COPY auth.json /etc/ostree/auth.json
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda3 xfs 20G 2.1G 18G 11% /
devtmpfs devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs tmpfs 3.8G 0 3.8G 0% /dev/shm
efivarfs efivarfs 128M 9.9K 128M 1% /sys/firmware/efi/efivars
tmpfs tmpfs 1.5G 17M 1.5G 2% /run
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/systemd-journald.service
/dev/sda2 vfat 200M 8.4M 192M 5% /boot/efi
/dev/sdb1 ext4 74G 24K 70G 1% /mnt
tmpfs tmpfs 768M 4.0K 768M 1% /run/user/1000
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
fatal: [guest]: FAILED! => changed=true
cmd:
- podman
- run
- --rm
- --tls-verify=false
- --privileged
- --pid=host
- quay.io/redhat_emp1/bootc-workflow-test:j75a
- bootc
- install
- to-existing-root
delta: '0:00:56.992913'
end: '2025-01-24 03:07:39.014482'
msg: non-zero return code
rc: 1
start: '2025-01-24 03:06:42.021569'
stderr: |-
----------------------------
WARNING: This operation will OVERWRITE THE BOOTED HOST ROOT FILESYSTEM and is NOT REVERSIBLE.
Waiting 20s to continue; interrupt (Control-C) to cancel.
----------------------------
[31mERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Importing: Parsing layer blob sha256:8a6a121be27996f4b6f746e353e1dd34cd40b315c0e3b81e6b874fc97fa03054: error: ostree-tar: Processing deferred hardlink var/cache/dnf/rhel-appstream-c101d4db5fbc3a4f/repodata/527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Failed to find object: No such file or directory: 527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Processing tar: Failed to commit tar: ExitStatus(unix_wait_status(256))
stderr_lines: <omitted>
stdout: |-
Installing image: docker://quay.io/redhat_emp1/bootc-workflow-test:j75a
Digest: sha256:ecaeebb45182c17021d182d08a1881d81ae1fd65a5d07ed9a0ee6087fef7d9d7
Initializing ostree layout
layers already present: 0; layers needed: 74 (774.6 MB)
- openstack: Failed
FROM registry.stage.redhat.io/rhel10/rhel-bootc:10.0
COPY rhel.repo /etc/yum.repos.d/rhel.repo
RUN dnf install -y rhc
# Enable passwordless sudo for users in the wheel group
COPY wheel-nopasswd /etc/sudoers.d
ARG sshpubkey
# We don't yet ship a one-invocation CLI command to add a user with a SSH key unfortunately
RUN if test -z "$sshpubkey"; then echo "must provide sshpubkey"; exit 1; fi; \
useradd -G wheel cloud-user && \
mkdir -m 0700 -p /home/cloud-user/.ssh && \
echo $sshpubkey > /home/cloud-user/.ssh/authorized_keys && \
chmod 0600 /home/cloud-user/.ssh/authorized_keys && \
chown -R cloud-user: /home/cloud-user
RUN dnf -y clean all
COPY auth.json /etc/ostree/auth.json
Filesystem Type Size Used Avail Use% Mounted on
/dev/vda3 xfs 30G 2.1G 28G 8% /
devtmpfs devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs tmpfs 885M 0 885M 0% /dev/shm
tmpfs tmpfs 354M 5.2M 349M 2% /run
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/systemd-journald.service
/dev/vda2 vfat 200M 8.4M 192M 5% /boot/efi
tmpfs tmpfs 177M 4.0K 177M 1% /run/user/1000
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
tmpfs tmpfs 1.0M 0 1.0M 0% /run/credentials/[email protected]
fatal: [guest]: FAILED! => changed=true
cmd:
- podman
- run
- --rm
- --tls-verify=false
- --privileged
- --pid=host
- quay.io/redhat_emp1/bootc-workflow-test:6sl3
- bootc
- install
- to-existing-root
delta: '0:01:19.804500'
end: '2025-01-23 23:05:34.267542'
msg: non-zero return code
rc: 1
start: '2025-01-23 23:04:14.463042'
stderr: |-
----------------------------
WARNING: This operation will OVERWRITE THE BOOTED HOST ROOT FILESYSTEM and is NOT REVERSIBLE.
Waiting 20s to continue; interrupt (Control-C) to cancel.
----------------------------
[31mERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Importing: Parsing layer blob sha256:51bc788965574e1789dc733a2f5a5034a71886aad34928edec57c80ea46fac2f: error: ostree-tar: Processing deferred hardlink var/cache/dnf/rhel-appstream-c101d4db5fbc3a4f/repodata/527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Failed to find object: No such file or directory: 527fa9e5c8c45b22a7bbc2821c96540817984e837c219acb5141a462a08d45f6-primary.xml.gz: Processing tar: Failed to commit tar: ExitStatus(unix_wait_status(256))
stderr_lines: <omitted>
stdout: |-
Installing image: docker://quay.io/redhat_emp1/bootc-workflow-test:6sl3
Digest: sha256:8438cf4f83d77a92719b98adca8bd842b72389ad3908f5eee91aa347ca538808
Initializing ostree layout
layers already present: 0; layers needed: 73 (755.2 MB)
FROM registry.stage.redhat.io/rhel10/rhel-bootc:10.0
COPY rhel.repo /etc/yum.repos.d/rhel.repo
RUN dnf install -y rhc
RUN dnf -y install cloud-init && \
ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants && \
rm -rf /var/{cache,log} /var/lib/{dnf,rhsm}
COPY usr/ /usr/
RUN dnf -y clean all
COPY auth.json /etc/ostree/auth.json
RUN sed -i "s/name: cloud-user/name: ec2-user/g" /etc/cloud/cloud.cfg
Note that unless you're using --squash for this build, the first RUN dnf install -y rhc is going to leak into the image all of the caches into the layer. The layer RUN dnf -y clean all will only remove them from the top - they still get shipped in the intermediate layers.
We should definitely track down this bug, because what we're doing here should work but, this will look cleaner using heredocs and it may work around this:
FROM registry.stage.redhat.io/rhel10/rhel-bootc:10.0
COPY rhel.repo /etc/yum.repos.d/rhel.repo
COPY usr/ /usr/
COPY auth.json /etc/ostree/auth.json
RUN <<EORUN
set -xeuo pipefail
dnf install -y rhc
dnf -y install cloud-init
ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants
sed -i "s/name: cloud-user/name: ec2-user/g" /etc/cloud/cloud.cfg
dnf -y clean all
rm -rf /var/{cache,log} /var/lib/{dnf,rhsm}
EORUN
I know we should be updating some of our examples to use heredocs. One thing that has bit me is that the default podman in GitHub actions is too old for it, which is super annoying (ref https://github.com/containers/podman/discussions/17362 )
Anyways OK I couldn't reproduce this in a quick test...have you reproduced this in an interactive run?
Oh hmm...I notice we may have qemu emulation going on in some builds? That might be related.
Note also that this issue should be independent of the host version because we're using podman run <image> bootc install all the code that is relevant is the ostree/bootc code inside the target image.
That said, this type of failure is also likely to occur when doing e.g. a bootc switch to that target image.
Right. sed -i "s/dnf clean all/dnf clean all \&\& rm -rf \/var\/{cache,log} \/var\/lib\/{dnf,rhsm}/g" "$INSTALL_CONTAINERFILE" fixed this issue. But the persistent log does not work in this case.
CS10, bootc 1.1.3 on libvirt has error ERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver
Log: https://artifacts.osci.redhat.com/testing-farm/ce9e7a5b-9a74-4c2d-a090-539ee208b936/
RHEL 9.6, bootc 1.1.4 all platforms has error [31mERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reference "[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]quay.io/redhat_emp1/hidden:23tl@sha256:b9110b81b62013e65b36927db140f45d71da0bd49bdb2d2d0ce95b2f09749ce4" does not resolve to an image ID: identifier is not an image
Log: https://artifacts.osci.redhat.com/testing-farm/06574185-504b-43d7-a8b3-d65ce35d582e/
fedora-bootc:41 and 42 test passed. centos-bootc:stream9 test passed.
Installing to filesystem: Creating ostree deployment: Pulling: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver
That's...bizarre. How could it only be broken in that way on c10s but not other streams? I have no idea what's going on there.
RHEL 9.6, bootc 1.1.4 all platforms has error [31mERROR[0m Installing to filesystem: Creating ostree deployment: Pulling: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reference "[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]quay.io/redhat_emp1/hidden:23tl@sha256:b9110b81b62013e65b36927db140f45d71da0bd49bdb2d2d0ce95b2f09749ce4" does not resolve to an image ID: identifier is not an image
If strema9 works but 9.6 is failing then in theory there is some skew between the two that should otherwise be the same, so we'll need to chase this. I know others have hit this, but I have no idea why this specific bit again could fail in just one stream but not others.
The following error happened in AWS bootc install to-existing-root test twice today. The third times passed.
ERROR: Installing to filesystem: Creating ostree deployment: Cannot redeploy over extant stateroot default
- https://artifacts.osci.redhat.com/testing-farm/e65ec608-9e70-41f2-947c-220128a970a7/
- https://artifacts.osci.redhat.com/testing-farm/8de21d8b-5933-4bf3-b500-308e9e35d7f3/
The following scenarios are only failed with podman run --rm --tls-verify=false --privileged --pid=host 192.168.100.1:5000/hidden:kayh bootc install to-existing-root but passed with podman run --rm --tls-verify=false --privileged --pid=host -v /:/target -v /dev:/dev -v /var/lib/containers:/var/lib/containers --security-opt label=type:unconfined_t quay.io/redhat_emp1/hidden:mrmd bootc install to-existing-root
rhel-bootc:9.6image : https://artifacts.osci.redhat.com/testing-farm/61bf5c6f-cea8-47e6-82c0-95080392370c/centos-bootc:stream9image: https://artifacts.osci.redhat.com/testing-farm/b53db925-7749-45a3-9b94-6a34da5a5b2e/
Error log:
ERROR: Installing to filesystem: Creating ostree deployment: Pulling: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reference "[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]quay.io/bootc-test/hidden:q63d@sha256:cc647a9b755f20211a5023654aeafddea5574b1d1a5771134dfb268f38d12d5e" does not resolve to an image ID: identifier is not an image
Note:
rhel-bootc:10.0, centos-bootc:stream10, fedora-bootc:41/42/43 passed podman run --rm --tls-verify=false --privileged --pid=host 192.168.100.1:5000/hidden:kayh bootc install to-existing-root test. ref: https://gitlab.com/fedora/bootc/tests/bootc-workflow-test/-/jobs/9148630303
The following error happened in AWS bootc install to-existing-root test twice today. The third times passed. ERROR: Installing to filesystem: Creating ostree deployment: Cannot redeploy over extant stateroot default
This will happen if the install is retried on failure, which we don't currently support today (but should!). If it's not a retry scenario then it will need debugging.
The following scenarios are only failed with
OK so our automatic bind mounts aren't working here...but are we sure we have an updated bootc in teh images?