Bazel in Buildah in container cannot resolve DNS due to empty resolv.conf
Issue Description
My Containerfile includes bazel build commands. I build the Containerfile using Buildah in a containerized environment (in OpenShift, but the problem is also reproducible with Podman).
The first RUN bazel build instruction sees a valid /etc/resolv.conf and can resolve DNS just fine (see the reproducer to make sense of this output):
INFO: From Executing genrule //:hello_1:
--- /etc/resolv.conf ---
search <redacted>
nameserver <redacted>
nameserver <redacted>
nameserver <redacted>
nameserver <redacted>
------------------------
File: /etc/resolv.conf
Size: 159 Blocks: 8 IO Block: 4096 regular file
Device: 0,112 Inode: 21604992 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-10-09 15:41:59.399448880 +0000
Modify: 2025-10-09 15:41:57.908353740 +0000
Change: 2025-10-09 15:41:57.908353740 +0000
Birth: 2025-10-09 15:41:57.908353740 +0000
The second RUN bazel build gets an empty /etc/resolv.conf (with mtime conspicuously set to UNIX 0) and cannot resolve DNS:
curl: (6) Could not resolve host: github.com
--- /etc/resolv.conf ---
------------------------
File: /etc/resolv.conf
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 0,43 Inode: 21606079 Links: 1
Access: (0000/----------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-10-09 15:42:03.967380374 +0000
Modify: 1970-01-01 00:00:00.000000000 +0000
Change: 2025-10-09 15:42:03.675536870 +0000
Birth: 2025-10-09 15:42:03.675536870 +0000
Target //:hello_2 failed to build
If I merge both commands into one RUN instruction, both succeed. (Which is an acceptable workaround for simple Containerfiles, but isn't always an option).
Using git bisect with a reproducer script, I identified this as the first broken commit: a3bea818b85b8b0e07c7b99583d2664bfd76d84f
Steps to reproduce the issue
Minimal Containerfile:
FROM registry.fedoraproject.org/fedora-minimal:42@sha256:ff3a56f47ba6d32c40091b396ca1d33546a36134b8ae973e2a129c02b4cbb054
RUN dnf -y install which unzip
# Set up Bazel
ARG BAZEL_VERSION=7.6.2
RUN <<EOF
set -euo pipefail
bazel_installer="https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh"
curl -fsSL "$bazel_installer" -o /tmp/bazel-install.sh
sh /tmp/bazel-install.sh
rm /tmp/bazel-install.sh
EOF
# Prepare Bazel workspace
WORKDIR /workspace
COPY <<BUILD <<WORKSPACE /workspace/
genrule(
name = "hello_1",
outs = ["hello_1.txt"],
cmd = """
echo "--- /etc/resolv.conf ---"
cat /etc/resolv.conf
echo "------------------------"
stat /etc/resolv.conf
curl -I -fsS https://github.com >/dev/null
echo 'hi' > "$(location hello_1.txt)"
""",
)
genrule(
name = "hello_2",
outs = ["hello_2.txt"],
cmd = """
echo "--- /etc/resolv.conf ---"
cat /etc/resolv.conf
echo "------------------------"
stat /etc/resolv.conf
curl -I -fsS https://github.com >/dev/null
echo 'hi' > "$(location hello_2.txt)"
""",
)
BUILD
# empty
WORKSPACE
RUN bazel build //:hello_1
RUN bazel build //:hello_2
Run buildah in a container (e.g. using podman) and build this Containerfile:
podman run --rm -ti -v "$PWD:$PWD:z" -w "$PWD" -e STORAGE_DRIVER=vfs \
quay.io/containers/buildah:v1.41.4 \
buildah build .
Note: STORAGE_DRIVER=vfs isn't necessary for the reproducer, only to avoid fuse-overlayfs: cannot mount.
The reproducer also works with --privileged and without -e STORAGE_DRIVER=vfs
Describe the results you received
The second RUN bazel gets an empty /etc/resolv.conf and fails to resolve DNS
Describe the results you expected
RUN bazel gets a valid /etc/resolv.conf every time
buildah version output
The one from `quay.io/containers/buildah:v1.41.4`
buildah info output
# with STORAGE_DRIVER=vfs:
{
"host": {
"CgroupVersion": "v2",
"Distribution": {
"distribution": "fedora",
"version": "42"
},
"MemFree": 3247489024,
"MemTotal": 33056284672,
"OCIRuntime": "crun",
"SwapFree": 8589930496,
"SwapTotal": 8589930496,
"arch": "amd64",
"cpus": 14,
"hostname": "225ec1d1a383",
"kernel": "6.16.8-200.fc42.x86_64",
"os": "linux",
"rootless": true,
"uptime": "9h 1m 56.21s (Approximately 0.38 days)",
"variant": ""
},
"store": {
"ContainerStore": {
"number": 0
},
"GraphDriverName": "vfs",
"GraphOptions": [
"vfs.imagestore=/var/lib/shared",
"vfs.imagestore=/usr/lib/containers/storage"
],
"GraphRoot": "/var/lib/containers/storage",
"GraphStatus": {},
"ImageStore": {
"number": 0
},
"RunRoot": "/run/containers/storage"
}
}
# With --privileged:
{
"host": {
"CgroupVersion": "v2",
"Distribution": {
"distribution": "fedora",
"version": "42"
},
"MemFree": 3238711296,
"MemTotal": 33056284672,
"OCIRuntime": "crun",
"SwapFree": 8589930496,
"SwapTotal": 8589930496,
"arch": "amd64",
"cpus": 14,
"hostname": "b3db8ff4a24e",
"kernel": "6.16.8-200.fc42.x86_64",
"os": "linux",
"rootless": true,
"uptime": "9h 2m 13.13s (Approximately 0.38 days)",
"variant": ""
},
"store": {
"ContainerStore": {
"number": 0
},
"GraphDriverName": "overlay",
"GraphOptions": [
"overlay.imagestore=/var/lib/shared",
"overlay.imagestore=/usr/lib/containers/storage",
"overlay.mount_program=/usr/bin/fuse-overlayfs",
"overlay.mountopt=nodev,fsync=0"
],
"GraphRoot": "/var/lib/containers/storage",
"GraphStatus": {
"Backing Filesystem": "btrfs",
"Native Overlay Diff": "false",
"Supports d_type": "true",
"Supports shifting": "true",
"Supports volatile": "true",
"Using metacopy": "false"
},
"ImageStore": {
"number": 0
},
"RunRoot": "/run/containers/storage"
}
}
Provide your storage.conf
The one from `quay.io/containers/buildah:v1.41.4`
Upstream Latest Release
Yes
Additional environment details
Works when I run buildah build directly on my machine, breaks in containers.
This is unique to Bazel, outside of the bazel build environment the second RUN instruction has a valid resolv.conf
Additional information
No response
It looks like bazel is starting a daemon in the first RUN instruction that the chroot isolation isn't able to ensure is terminated. After the instruction finishes, the temporary resolv.conf that was bind mounted over the one in the working container's rootfs is unmounted, and if a dummy file had to be created so that the temporary file could be bind mounted over it, the dummy file would have the characteristics bazel is reporting.
Handling resolv.conf this way makes it easy to notice if the bind mount was unmounted and the underlying dummy file modified while the RUN instruction was executing, so that the dummy file's contents can be preserved for those specific cases. Otherwise, having a resolv.conf show up in the image that wasn't intended to be added during the build is a bug.
The bazel command in the second RUN instruction contacts the daemon that was started in the first one's mount namespace, where the resolv.conf file that was provided for it has already been cleaned up.
Running the container with --privileged and building with --isolation oci, which is better able to ensure that processes started during one RUN are killed off, this doesn't seem to happen. Other than using bazel's "shutdown" command to cause it to quit in the same RUN instruction, I'm not sure what we can do here if we want to keep being able to run in an unprivileged container.
Thanks for the detailed root cause analysis!
I think bazel shutdown could be an acceptable solution. While testing it, I ran into https://github.com/bazelbuild/bazel/issues/13823 - shutdown fails if the container doesn't have an init process that reaps zombies. For the Buildah-in-Podman case, podman run --init solves that easily. I'll see if I can solve it for our primary use case as well (Buildah in a Tekton Task).
It looks like replacing buildah build ... in the Task code with tini -s -- buildah build ... will achieve what we need (i.e. make bazel shutdown work) 🎉
A friendly reminder that this issue had no activity for 30 days.