buildkit's rootless mode doesn't work on Bottlerocket
Image I'm using: aws-k8s-1.21 1.6.0
What I expected to happen: I deployed "rootless" buildkit and then ran these steps after enabling user namespaces on the node.
kubectl exec -it statefulset/buildkitd -- /bin/sh
cd /tmp
cat <<EOF >Dockerfile
FROM amazonlinux:2
RUN echo hi
EOF
buildctl --addr unix:///run/user/1000/buildkit/buildkitd.sock build --frontend=dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=test-image,push=false
I expected the build to work correctly.
What actually happened: The build failed with an error like this:
failed to solve: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount313966645: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 Options:[rbind ro]}]: operation not permitted.
How to reproduce the problem: See above.
Digging around with bpftrace, I was able to rule out LSM checks - SELinux, capabilities - and narrowed the EPERM result down to path_mount.
There are actually two mount calls that happen. The first is a bind mount from /home/user/.local/share/buildkit/... to /home/user/.local/tmp/buildkit-mount/.... This call succeeds which can be confirmed from inspecting /proc/<pid>/mountinfo for the buildkitd process.
The second call is a remount call which fails the can_change_locked_flags check, resulting in the "operation not permitted" error.
This happens because:
- buildkit defines an image volume for
/home/user/.local/share/buildkit - containerd creates a directory on the host's
/localfor this volume, and bind mounts it into the container - the host
/localfilesystem is mounted with nosuid,nodev - attempting a remount without passing those options is treated as an attempt to change locked mount flags, which fails
I confirmed that "rootless" buildkit works on a custom build of Bottlerocket that drops these flags from /local.
I tried a manual workaround: defining an EBS volume for /home/user/.local/share/buildkit instead, and configuring containerd to ignore image defined volumes. This didn't work; the application failed to start because the directory on the host wasn't relabeled correctly - which feels like a separate bug.
One upstream fix might be for buildkitd to add nosuid,nodev options to the remount call if they're present in the mount options, but it's not immediately clear whether this would break something later in the process.
This looks to be essentially the same issue as https://github.com/moby/buildkit/issues/879, except it's not specific to the ChromiumOS LSM patches.
May be addressed by https://github.com/moby/buildkit/pull/3697 ?
I tried this, but now there is a different error:
------
> exporting to image:
------
error: failed to solve: failed to mount /run/user/1000/containerd-mount3624810005: operation not permitted
I tried this, but now there is a different error:
------ > exporting to image: ------ error: failed to solve: failed to mount /run/user/1000/containerd-mount3624810005: operation not permitted
have you found a fix or workaround? I've got same error on building GO images only
@koushyk I suppose it is still not working. @bcressey @stmcginnis I have opened this ticket with buildkit team about running it in rootless without privileged mode as it is still failing. Buildkit worked as expected on the latest AMIs of both EKS optimized AL2 and Ubuntu images. The experiment was done on Bottlerocket with the following info:
- Arch: arm64
- buildkit image: moby/buildkit:rootless "v0.12.5 currently"
- Bottlerocket OS 1.19.1 (aws-k8s-1.28)
- k8s version: v1.28.5-eks-5e0fdde Bottlerocket userdata has:
[settings.kernel.sysctl]
"user.max_user_namespaces" = "63359"
and the pod used for testing is:
apiVersion: v1
kind: Pod
metadata:
name: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
spec:
nodeSelector:
workload: runners
containers:
- name: buildkitd
image: moby/buildkit:rootless
args:
- --addr
- tcp://0.0.0.0:1234
- --oci-worker-no-process-sandbox
- --debug
securityContext:
seccompProfile:
type: Unconfined
runAsUser: 1000
runAsGroup: 1000
volumeMounts:
- mountPath: /home/user/.local/share/buildkit
name: buildkitd
- name: runner
image: moby/buildkit:rootless
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
env:
- name: BUILDKIT_HOST
value: tcp://localhost:1234
volumes:
- name: buildkitd
emptyDir: {}
and we ended up with the following error:
time="2024-02-19T12:34:34Z" level=warning msg="failed to compute blob by overlay differ (ok=false): failed to write compressed diff: mount callback failed on /run/user/1000/containerd-mount387897202: mount callback failed on /run/user/1000/containerd-mount1737412574: failed to record upperdir changes (close error: failed to close tar writer: context canceled): context canceled"
time="2024-02-19T12:34:34Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to mount /run/user/1000/containerd-mount3852074643: operation not permitted"
The successful experiment was done using the following:
Ubuntu:
AL2:
Can you please check if there is any missing configuration that needs to be done on bottlerocket hosts, or the pod security context?
Thank you very much.
As a workaround, you may use any other storage like EBS volumes or instance store volumes instead of Bottlerocket OS local mount. For example, please refer this comment where I used EBS volumes as volume mount for my rootless buildkit pod and was able to successfully build images