buildkit
buildkit copied to clipboard
Rootless mode doesn't work on Google Container-Optimized OS kernel (CONFIG_SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS?)
~ $ cat Dockerfile
FROM alpine
~ $ export BUILDKIT_HOST=tcp://127.0.0.1:1234
~ $ buildctl b --frontend dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.0s (2/2) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 49B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
error: failed to solve: rpc error: code = Unknown desc = failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount290620720: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 Options:[rbind ro]}]: operation not permitted
But unshare -rm mount
works 🤔
~ $ unshare -mr
buildkitd-649b4db5d4-jskbq:/home/user# mount --rbind -o ro /home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 /home/user/.local/tmp/buildkit-mount710693070
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
***************************************** Ready <none> 19m v1.12.5-gke.5 ************** Container-Optimized OS from Google 4.14.89+ docker://17.3.2
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
selector:
matchLabels:
app: buildkitd
template:
metadata:
labels:
app: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
container.seccomp.security.alpha.kubernetes.io/buildkitd: unconfined
spec:
containers:
- image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
args:
- --addr
- tcp://0.0.0.0:1234
- --oci-worker-no-process-sandbox
name: buildkitd
ports:
- containerPort: 1234
---
apiVersion: v1
kind: Service
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
ports:
- port: 1234
protocol: TCP
selector:
app: buildkitd
Note: the same step (w/ --oci-worker-snapshotter=native
) succeeds with the following envs:
- Docker for Mac 2.0.3.0 (Build 31778, Kube 1.13.0, Docker 18.09.3)
- minikube v0.35.0
- AKS v1.12.6 (kernel 4.15.0-1037-azure, Ubuntu 16.04.5, MS-Moby 3.0.4)
wondering this might be related to ChromiumOS LSM, but not sure https://chromium.googlesource.com/chromiumos/third_party/kernel/+/HEAD/security/chromiumos
@AkihiroSuda just to be clear, it does not work without setting securityContext in GKE?
No, even privileged: true
does not work with rootless image.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
selector:
matchLabels:
app: buildkitd
template:
metadata:
labels:
app: buildkitd
spec:
containers:
- image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
args:
- --addr
- tcp://0.0.0.0:1234
name: buildkitd
ports:
- containerPort: 1234
securityContext:
privileged: true
With rootful image, it works. (tested both overlay
and native
for rootful)
@AkihiroSuda So is this a regression in v0.4 ?
No, even v0.3.0-rootless
w/ securityContext: privileged
does not work now.
This is rather likely to be a regression in GKE, although I don't have any evidence that v0.3.0-rootless
had been working on GKE.
v0.4.0-rootless (both overlay
and native
; both w/ and w/o privileged
) works with GKE Ubuntu nodes (kernel 4.15.0-1026-gcp #27-Ubuntu
, kube v1.11.7-gke.4, Ubuntu 18.04.1, docker://17.3.2).
Seems an issue on Google COS.
strace:
buildkit (fails) (https://github.com/containerd/containerd/pull/1373)
[pid 15561] mkdirat(AT_FDCWD, "/home/user/.local/tmp/buildkit-mount226977687", 0700) = 0
[pid 15561] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f2848, MS_RDONLY|MS_BIN
D|MS_REC, NULL) = 0
[pid 15561] mount("", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f284e, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = -1 EPERM (Operation not permitted)
mount -o rbind,ro
(succeeds)
[pid 17658] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", NULL, MS_RDONLY|MS_BIND|MS_REC|MS_SILENT, NULL)
= 0
likely to be related to SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/Kconfig#21
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/lsm.c#133
I just tried with the COS nodes of 1.15.4-gke.18
and the regressions seems to be still there :(
Any updates on this issue?
Needs help from Google
So can anything be done?
Maybe https://github.com/AkihiroSuda/containerd-fuse-overlayfs can be a solution, but blocked due to go mod hell #1297
Can I do anything to help?
Another way is to replace the failing mount flags
with what "unshare -rm mount" example in the top comment of this issue uses.
This needs more investigation and help is appreciated, thanks.
So you want to change the error? (sorry I'm new)
"unshare -rm mount" example doesn't produce any error, and we want to avoid BuildKit error by using the same mount flags
I assumed fuse-overlayfs snapshotter may work, but seems not :cry:
$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=.
[+] Building 0.2s (2/2) FINISHED
=> [internal] load build definition from Dockerfile 0.2s
=> => transferring dockerfile: 109B 0.2s
=> [internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.2s
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount998042514: [{Type:bind Source:/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/1/fs Options:[rbind ro]}]: operation not permitted
Not only the issue in snapshotter
$ git diff
diff --git a/vendor/github.com/containerd/containerd/mount/mount_linux.go b/vendor/github.com/containerd/containerd/mount/mount_linux.go
index a7edd455..526640be 100644
--- a/vendor/github.com/containerd/containerd/mount/mount_linux.go
+++ b/vendor/github.com/containerd/containerd/mount/mount_linux.go
@@ -93,7 +93,10 @@ func (m *Mount) Mount(target string) error {
const broflags = unix.MS_BIND | unix.MS_RDONLY
if oflags&broflags == broflags {
// Remount the bind to apply read only.
- return unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+ unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+ // DO-NOT-MERGE:
+ // ignore err here to avoid hitting https://github.com/moby/buildkit/issues/879#issuecomment-473396544
+ // How can we ensure target to be read-only?
}
return nil
}
$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=
.
[+] Building 6.1s (5/6)
=> [internal] load build definition from Dockerfile 0.2s
=> => transferring dockerfile: 109B 0.2s
=> [internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.1s
=> [internal] load metadata for docker.io/library/alpine:latest 3.3s
=> [1/3] FROM docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 2.1s
=> => resolve docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 0.0s
=> => sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 1.64kB / 1.64kB 0.0s
=> => sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45 528B / 528B 0.0s
=> => sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9 2.80MB / 2.80MB 1.2s
=> => sha256:e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a 1.51kB / 1.51kB 0.0s
=> => unpacking docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 0.1s
=> ERROR [2/3] RUN apk add --no-cache figlet 0.1s
------
> [2/3] RUN apk add --no-cache figlet:
#5 0.084 container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.g
o:58: mounting \\\"/home/user/.local/share/buildkit/runc-native/executor/resolv.conf\\\" to rootfs \\\"/home/user/.local/share/b
uildkit/runc-native/executor/c9qbj5rmvwnjixos72ek7k7ko/rootfs\\\" at \\\"/home/user/.local/share/buildkit/runc-native/executor/c
9qbj5rmvwnjixos72ek7k7ko/rootfs/etc/resolv.conf\\\" caused \\\"operation not permitted\\\"\""
------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c apk add --no-cache figlet]: buildki
t-runc did not terminate successfully
Any updates on this?
Using an idea from https://github.com/bottlerocket-os/bottlerocket/issues/1934 I added an emptyDir volume to /home/user/.local/share/buildkit
and it worked.
@ei-grad On GCOS kernel? 👀
Isn't this VOLUME
working by default? 🤔
https://github.com/moby/buildkit/blob/c9a0f4d2de095591e742d7f411d9ed36a03a1c4e/Dockerfile#L292
On GCOS kernel
Yes, latest GKE with cos_containerd
image. I got fully functional rootless buildkit with resource definitions from examples/kubernetes with only added a emptyDir/hostPath volumeMount for /home/user/.local/share/buildkit
.
Isn't this
VOLUME
working by default? 🤔
Yes, and that's the problem - default volumes are mounted with nosuid,nodev
flags, which cause Permission denied
error trying to remount this volume without this flags. See details in an excellent investigation from @bcressey there in linked bottlerocket issue.
I got fully functional rootless buildkit
Just to clarify - using the native
snapshotter. Using the overlayfs/fuse-overlayfs on GCOS requies priveleged: true
since the kernel is 5.10 😢.