ERRO[0025] unlinkat /var/tmp/buildah2410054376/mounts3022885724/bind626918239: device or resource busy
When building inside a rootless container using buildah's vfs storage driver and chroot isolation (As is very often done to build images in CI environments), specifying read/write bind volumes from other stages results in an error. This behavior does not reproduce using buildah 1.37 or earlier. Also verified this same behavior using a vanilla registry.fedoraproject.org/fedora-minimal images + dnf5 install buildah. That is to say, I think it's a buildah problem, not a buildah image problem.
Reproduction (host) environment:
- Fedora 40
- podman 5.3.1
- Running as a regular user w/ default podman settings
- The
quay.io/buildah/upstream:latestcontainer image (buildah version 1.40.0-dev (image-spec 1.1.0, runtime-spec 1.2.0)) - The
quay.io/buildah/stable:v1.38container image - The
quay.io/buildah/stable:v1.37container image
Steps to reproduce:
- Create the following
Containerfilesomewhere in the users homedirFROM registry.fedoraproject.org/fedora-minimal:latest as test RUN mkdir -p /var/tmp/test ADD ./Containerfile /var/tmp/test/ FROM test as final RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw \ set -x && \ date > /var/tmp/test/Containerfile && \ cat /var/tmp/test/Containerfile - Run
podman run -it --rm -v ./Containerfile:/root/Containerfile:ro,Z quay.io/buildah/stable:v1.38 buildah --storage-driver=vfs build --isolation=chroot /root - Run the exact same command, but with
quay.io/buildah/stable:v1.37(or any other earlier version)
Unexpected results:
[1/2] STEP 1/3: FROM registry.fedoraproject.org/fedora-minimal:latest AS test
Trying to pull registry.fedoraproject.org/fedora-minimal:latest...
Getting image source signatures
Copying blob 169491f3e4f7 done |
Copying config e6917e6306 done |
Writing manifest to image destination
[1/2] STEP 2/3: RUN mkdir -p /var/tmp/test
[1/2] STEP 3/3: ADD ./Containerfile /var/tmp/test/
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob cec21250b843 done |
Copying config 9f9e432f21 done |
Writing manifest to image destination
--> 9f9e432f21cb
[2/2] STEP 1/2: FROM 9f9e432f21cbb67c928b93d87af3878f3b903cbc2030cc12594f9368829ccc8c AS final
[2/2] STEP 2/2: RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw set -x && date > /var/tmp/test/Containerfile && cat /var/tmp/test/Containerfile
ERRO[0025] unlinkat /var/tmp/buildah1274147250/mounts4133407440/bind3931917386: device or resource busy
Error: building at STEP "RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw set -x && date > /var/tmp/test/Containerfile && cat /var/tmp/test/Containerfile": resolving mountpoints for container "bb08d8062b4c17b75108492838e53d3236abce647447c8f5bec72cebfcb8ca1b": setting up overlay of "/var/tmp/buildah1274147250/mounts4133407440/bind3931917386": mount overlay:/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/merge, data: lowerdir=/var/tmp/buildah1274147250/mounts4133407440/bind3931917386,upperdir=/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/upper,workdir=/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/work,userxattr: invalid argument
Expected results (from v1.37):
[1/2] STEP 1/3: FROM registry.fedoraproject.org/fedora-minimal:latest AS test
Trying to pull registry.fedoraproject.org/fedora-minimal:latest...
Getting image source signatures
Copying blob 169491f3e4f7 done |
Copying config e6917e6306 done |
Writing manifest to image destination
[1/2] STEP 2/3: RUN mkdir -p /var/tmp/test
[1/2] STEP 3/3: ADD ./Containerfile /var/tmp/test/
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob b50f8aabd929 done |
Copying config 71ea00d65f done |
Writing manifest to image destination
--> 71ea00d65f89
[2/2] STEP 1/2: FROM 71ea00d65f8949486c4441a13b231fd4992b2be2c4170e97a0b9baae11244f71 AS final
[2/2] STEP 2/2: RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw set -x && date > /var/tmp/test/Containerfile && cat /var/tmp/test/Containerfile
WARN[0000] couldn't find "/var/lib/containers/storage/vfs/dir/7d684fe50918fe44941621b1721c8ee345f7884e2887f8cae36608bacb38e0e8/tmp/test" on host to bind mount into container
+ date
+ cat /var/tmp/test/Containerfile
Wed Feb 12 18:17:34 UTC 2025
[2/2] COMMIT
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob b50f8aabd929 skipped: already exists
Copying blob 11db3e39f474 done |
Copying config 83de1e9298 done |
Writing manifest to image destination
--> 83de1e9298fe
83de1e9298feac0ce7e01e89b840e42ecd3901a4a67d1b998b3bdbe176fd3a69
Debug output from v1.38 is below (v1.40.0-dev output is substantially similar):
Note: Also attempted with the following Containerfile with similar results:
FROM registry.fedoraproject.org/fedora-minimal:latest as test
ADD ./Containerfile /test/
RUN chmod 777 /test/Containerfile
#####
FROM test as final
RUN --mount=type=bind,from=test,src=/test,dst=/test,rw \
set -x && \
date > /test/Containerfile && \
cat /test/Containerfile
Poking through the debuglog and the code, I'm thinking perhaps this problem is stemming from within containers/storage based on convertToOverlay() getting an error back from overlay.MountWithOptions(). I didn't dig too deep into the storage code, but the ,userxattr suffix on the end of the debug messages made my ears stand up: "Why would that be present or even relevant for a VFS "bind" mount?"
time="2025-02-12T18:19:46Z" level=debug msg="Error building at step
{Env:[container=oci ...cut...: resolving mountpoints for container
...cut...: setting up overlay of \"/var/tmp/buildah3627628243/mounts2014160263/bind3820943893\":
mount overlay:
...cut...,
workdir=/var/tmp/buildah3627628243/mounts2014160263/overlay/1907194961/work,userxattr: invalid argument"
stumbled across what appears to be the same issue in a build (also using VFS storage driver), to me it seems the problem starts to appear with buildah version 1.37.6:
time="2025-02-21T09:00:59Z" level=error msg="unlinkat /var/tmp/buildah1222469549/mounts3222934611/bind1342232015: device or resource busy"
Error: building at STEP "RUN --mount=type=bind,source=requirements.txt,target=/tmp/pip-tmp/requirements.txt [...]": resolving mountpoints for container "8a8dd1c7104a71218d2e85f1b657facd2a45051f9c0ccf56a267ed85046d6d06": setting up overlay of "/var/tmp/buildah1222469549/mounts3222934611/bind1342232015": mount overlay:/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/merge, data: lowerdir=/var/tmp/buildah1222469549/mounts3222934611/bind1342232015,upperdir=/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/upper,workdir=/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/work,userxattr: invalid argument
this is with buildah version 1.37.6 (image-spec 1.1.0, runtime-spec 1.2.0) running via container registry.redhat.io/rhel9/buildah:9.5-1738643435.
everything works as expected with buildah version 1.37.5 (image-spec 1.1.0, runtime-spec 1.2.0) via registry.redhat.io/rhel9/buildah:9.5-1737479141
Interesting, and thanks for providing details. Knowing this behavior crept in via a patch release is actually really helpful. I just checked, and it was 1.37.5 that fixed the issue for me, which makes sense based on your experience.
Checking the git history, there are only 17 commits between 1.37.5 and 1.37.6. Of these, almost half are merge or changelog update commits. So that narrows things down quite a bit!
Based on the string setting up overlay of in the message, I believe the problem is somewhere in/around convertToOverlay() which first appeared in 2c7003508a (between .5 and .6). Curiously as near as I can tell, the containers/storage module was last updated in 1.37.5, so that's probably not the root cause.
There are several conditionals that would all emit a similar message, but I think this is coming from the the 4th one, dealing with a failure from overlay.MountWithOptions(). However, it's also possible this error is a red-herring, and the problem is really coming from GetBindMount(), where convertToOverlay() shouldn't even be used for a VFS mount (clearly we're not reproducing with a mountedImage):
func GetBindMount(...cut...
...cut...
overlayDir := ""
if mountedImage != "" || mountIsReadWrite(newMount) {
if newMount, overlayDir, err = convertToOverlay(newMount, store, mountLabel, tmpDir, 0, 0); err != nil {
return newMount, "", "", "", err
}
}
succeeded = true
return newMount, mountedImage, intermediateMount, overlayDir, nil
}
didn't get to look at the details of the commit, but it sounds very plausible to me. at least I can confirm that removing the rw option makes the mount itself succeed in my case, with the default read-only bind mount it would work. which also further hints towards these changes around read-write mounts.
and I noticed that I may have shortened the output in my earlier comment a bit much, so in case it could be helpful, the apparently problematic line in my build is: --mount=type=bind,source=third_party/,target=/tmp/pip-tmp/third_party/,rw (so in my case it's mounted from the host, not from an earlier build stage). as indicated in the last comment, removing the rw makes the mount work, so --mount=type=bind,source=third_party/,target=/tmp/pip-tmp/third_party/ works.
All good data points, thanks again for sharing. For VFS I don't think it matters if the source is another stage or w/in the context dir, both should just resolve to directories on the "host" side. SELinux could be to blame, however the way I was reproducing it, nested w/in quay.io/buildah/stable, rules that out.
Something interesting one of my colleagues noticed:
If you do a sudo dmesg -HW on the host then run the reproducer, there's an overlay error from the kernel at the exact same time as buildah tries the volume mount during the build. This is significant because with --storage-driver=vfs, the expectations is that overlay shouldn't be involved at all.
By my reading of internal/volumes/volumes.go to date, in the case of VFS, either GetBindMount() should never call convertToOverlay() or that function shouldn't be calling overlay.MountWithOptions() (which is overlay specific).
@nalind I think we need your expert eyes on this, it also affects main IIRC. Myself and @dashea have exhausted our brick-wall collision quota trying to understand and figure out the correct fix. Significantly, this issue afflicts all/most cases of using buildah in a CI environment to produce an image. So potentially konflux is impacted, as is GitLab CI, and similar container-based automation environments.
Read-write bind mounts get converted into overlays to match the expectation that writes to them get discarded. This was part of the patch set that we backported to multiple branches for CVE-2024-11218. It looks like the kernel is pointing out that the upper directory we attempt to use there, since we're in a container, is also on an overlay filesystem, which it doesn't allow. Forcing the storage driver to be vfs instead of the overlay-with-fuse-overlayfs default we have in storage.conf in the image discards the bit of configuration that would have caused fuse-overlayfs to be used, and that would have allowed it to succeed here.
Thanks for taking a look at this Nalin, I appreciate it. So it is as I/we feared, overlay is being forced. My understanding/belief is this would also be reproducible if the VFS driver was configured in storage.conf rather than on the command line. There are certainly CI environments (like gitlab) where fuse-overlay isn't supported.
As a fix, is it possible to detect if VFS is being used in convertToOverlay()? If so, would it be correct for that function to create yet another temporary directory, copy the "lower" content, then arrange for it to be thrown away? Or is there a better way to handle this?
following this as we are running into this issue as well-
"/home/build/.local/share/containers/storage/vfs/dir/56ea6acb37115a17b8188bc2601914e9fe64077ed0a70d0a4deb5554c56ad23b": mount overlay:/var/tmp/buildah3838130017/mounts3684384049/overlay/2916328100/merge, data: lowerdir=/home/build/.local/share/containers/storage/vfs/dir/56ea6acb37115a17b8188bc2601914e9fe64077ed0a70d0a4deb5554c56ad23b,upperdir=/var/tmp/buildah3838130017/mounts3684384049/overlay/2916328100/upper,workdir=/var/tmp/buildah3838130017/mounts3684384049/overlay/2916328100/work,userxattr: invalid argument
So potentially konflux is impacted,
Just for the record I discovered Konflux has a fork in https://github.com/konflux-ci/buildah-container/ - notice the buildah submodule there is ~4 months old at this time.
It looks like the kernel is pointing out that the upper directory we attempt to use there, since we're in a container, is also on an overlay filesystem, which it doesn't allow.
Don't we want to encourage people to provide an "emptydir" (in kube terms) i.e. transient non-overlayfs volume? Or honestly use podman run --read-only-tmpfs so we get /tmp and /var/tmp as non-overlayfs by default. Then c/storage (?) can detect this case and automatically use /var/tmp for the uppers.
Don't we want to encourage people to provide an "emptydir" (in kube terms)
This is perfectly valid and I agree this is probably a better way to run nested builds. However, two things:
- This worked previously, so it's a regression. It impacts several RHEL release branches as well 😭
- "Encouragement" is best provided in new major versions, or otherwise in the form of blogs and documentation 😉
Yes I agree (though I'm not the one writing the patches for this so it's easy to do 😄 )
One observation I would have is I think few of us have nested builds near the top of mind, it's certainly not in my day-to-day usage. But probably one thing that would make sense (tying together with my comment above) is to do "reverse dependency testing" by running the Konflux buildah task against proposed updates to buildah. The Konflux buildah task is a beast but it is how many things get built for production so we certainly need it to continue to work.
But probably one thing that would make sense (tying together with my comment above) is to do "reverse dependency testing" by running the Konflux buildah task against proposed updates to buildah.
This is a really good suggestion. While konflux may be the eventual destination, there's no reason why the current tests couldn't have caught this. It runs chroot tests and it runs VFS tests. It must simply be missing a test that tries to rw mount from a previous layer.
Edit: There's possibly a secondary avenue as well - containers/image_build actually produces the quay/buildah images, but doesn't test them very well. CI in that repo builds every day, and fires off e-mails on failure. I'll see if I can find a half-hour to add these tests.
A friendly reminder that this issue had no activity for 30 days.
X-ref: https://github.com/containers/buildah/pull/6126 (so it's more obvious)
What's the status of this Issue and the associated PR? It was approved at some point and then seemingly abandoned.
We run into the same problem in our CI and are interested in a solution to the problem, are there any plans to get this PR back on track in the near future?