bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

buildkit's rootless mode doesn't work on Bottlerocket

Open bcressey opened this issue 3 years ago • 6 comments

Image I'm using: aws-k8s-1.21 1.6.0

What I expected to happen: I deployed "rootless" buildkit and then ran these steps after enabling user namespaces on the node.

kubectl exec -it statefulset/buildkitd -- /bin/sh
cd /tmp
cat <<EOF >Dockerfile
FROM amazonlinux:2
RUN echo hi
EOF
buildctl --addr unix:///run/user/1000/buildkit/buildkitd.sock build --frontend=dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=test-image,push=false

I expected the build to work correctly.

What actually happened: The build failed with an error like this:

failed to solve: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount313966645: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 Options:[rbind ro]}]: operation not permitted.

How to reproduce the problem: See above.

bcressey avatar Jan 31 '22 22:01 bcressey

Digging around with bpftrace, I was able to rule out LSM checks - SELinux, capabilities - and narrowed the EPERM result down to path_mount.

There are actually two mount calls that happen. The first is a bind mount from /home/user/.local/share/buildkit/... to /home/user/.local/tmp/buildkit-mount/.... This call succeeds which can be confirmed from inspecting /proc/<pid>/mountinfo for the buildkitd process.

The second call is a remount call which fails the can_change_locked_flags check, resulting in the "operation not permitted" error.

This happens because:

  • buildkit defines an image volume for /home/user/.local/share/buildkit
  • containerd creates a directory on the host's /local for this volume, and bind mounts it into the container
  • the host /local filesystem is mounted with nosuid,nodev
  • attempting a remount without passing those options is treated as an attempt to change locked mount flags, which fails

I confirmed that "rootless" buildkit works on a custom build of Bottlerocket that drops these flags from /local.

I tried a manual workaround: defining an EBS volume for /home/user/.local/share/buildkit instead, and configuring containerd to ignore image defined volumes. This didn't work; the application failed to start because the directory on the host wasn't relabeled correctly - which feels like a separate bug.

One upstream fix might be for buildkitd to add nosuid,nodev options to the remount call if they're present in the mount options, but it's not immediately clear whether this would break something later in the process.

This looks to be essentially the same issue as https://github.com/moby/buildkit/issues/879, except it's not specific to the ChromiumOS LSM patches.

bcressey avatar Jan 31 '22 23:01 bcressey

May be addressed by https://github.com/moby/buildkit/pull/3697 ?

stmcginnis avatar Mar 09 '23 21:03 stmcginnis

I tried this, but now there is a different error:

------
 > exporting to image:
------
error: failed to solve: failed to mount /run/user/1000/containerd-mount3624810005: operation not permitted

arnaldo2792 avatar Sep 08 '23 18:09 arnaldo2792

I tried this, but now there is a different error:

------
 > exporting to image:
------
error: failed to solve: failed to mount /run/user/1000/containerd-mount3624810005: operation not permitted

have you found a fix or workaround? I've got same error on building GO images only

koushyk avatar Sep 20 '23 08:09 koushyk

@koushyk I suppose it is still not working. @bcressey @stmcginnis I have opened this ticket with buildkit team about running it in rootless without privileged mode as it is still failing. Buildkit worked as expected on the latest AMIs of both EKS optimized AL2 and Ubuntu images. The experiment was done on Bottlerocket with the following info:

  1. Arch: arm64
  2. buildkit image: moby/buildkit:rootless "v0.12.5 currently"
  3. Bottlerocket OS 1.19.1 (aws-k8s-1.28)
  4. k8s version: v1.28.5-eks-5e0fdde Bottlerocket userdata has:
    [settings.kernel.sysctl]
    "user.max_user_namespaces" = "63359"

and the pod used for testing is:

apiVersion: v1
kind: Pod
metadata:
  name: buildkitd
  annotations:
    container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
spec:
  nodeSelector:
    workload: runners
  containers:
    - name: buildkitd
      image: moby/buildkit:rootless
      args:
        - --addr
        - tcp://0.0.0.0:1234
        - --oci-worker-no-process-sandbox
        - --debug
      securityContext:
        seccompProfile:
          type: Unconfined
        runAsUser: 1000
        runAsGroup: 1000
      volumeMounts:
        - mountPath: /home/user/.local/share/buildkit
          name: buildkitd
    - name: runner
      image: moby/buildkit:rootless
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      env:
        - name: BUILDKIT_HOST
          value: tcp://localhost:1234
  volumes:
    - name: buildkitd
      emptyDir: {}

and we ended up with the following error:

time="2024-02-19T12:34:34Z" level=warning msg="failed to compute blob by overlay differ (ok=false): failed to write compressed diff: mount callback failed on /run/user/1000/containerd-mount387897202: mount callback failed on /run/user/1000/containerd-mount1737412574: failed to record upperdir changes (close error: failed to close tar writer: context canceled): context canceled"
time="2024-02-19T12:34:34Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to mount /run/user/1000/containerd-mount3852074643: operation not permitted"

The successful experiment was done using the following: Ubuntu: ubuntu AL2: al2 Can you please check if there is any missing configuration that needs to be done on bottlerocket hosts, or the pod security context? Thank you very much.

AhmadMS1988 avatar Feb 20 '24 10:02 AhmadMS1988

As a workaround, you may use any other storage like EBS volumes or instance store volumes instead of Bottlerocket OS local mount. For example, please refer this comment where I used EBS volumes as volume mount for my rootless buildkit pod and was able to successfully build images

vtgspk avatar May 09 '24 04:05 vtgspk