buildkit
buildkit copied to clipboard
Leaking containerd.sock mounts exponentially on host node
Running a cluster with heavy shared developer use, we've noticed Kubernetes nodes getting into a state where any pods can't come up or terminate. We've determined that on the nodes that get into this state, they have a pathological amount (32,766) of tmpfs /run/containerd/containerd.sock
mounts. We've isolated this to spinning up buildkit pods.
Steps to reproduce
- Run a k8s cluster
- Terminal into a k8s node running buildkit and run
cat /proc/mounts | grep containerd.sock | wc -l
- Cycle buildkit pods repeatedly and observe the count growing exponentially every time a new pod is started on that node
What actually happens The host node roughly doubles the current number of containerd.sock mounts every time a new buildkit pod starts
The node becomes unusable when cat /proc/mounts | grep containerd.sock
count hits 32,766 rows that look like this:
tmpfs /run/containerd/containerd.sock tmpfs rw,nosuid,nodev,mode=755 0 0
We have repro'd this issue on v0.13.1 with --containerd-worker=true
as well as v0.12.5 with --containerd-worker=false
.
Versions Kubelet: Kubernetes v1.24.17-eks-5e0fdde Docker: Docker version 20.10.25, build b82b9f3 Containerd: github.com/containerd/containerd 1.7.11 64b8a811b07ba6288238eefc14d898ee0b5b99ba
How do you run BuildKit and containerd?
What makes the mount for containerd.sock
?
Unlike docker.sock
, containerd.sock
is not expected to be bind-mounted currently.
Since this Buildkit is shared by several developers, we've built a chart that does a Buildkit deployment that everybody connects to. I had an HPA attached, which exacerbated this problem as the aggressive scaling caused lots of new mounts. The contents of the deployment.yaml is based on the deployment put out by the buildkit CLI.
The behavior we see is every scale up, a new pod is scheduled and the number of mounts becomes 2^X - 1 where X is the number of pods ever deployed to that node during its lifetime.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
proxy.istio.io/config: '{"holdApplicationUntilProxyStarts":true}'
labels:
{{- include "buildkit.labels" . | nindent 4 }}
app: buildkit
rootless: "false"
runtime: containerd
worker: containerd
name: buildkit
spec:
progressDeadlineSeconds: 600
replicas: {{ .Values.replicas }}
revisionHistoryLimit: 10
selector:
matchLabels:
{{- include "buildkit.selectorLabels" . | nindent 6 }}
app: buildkit
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: buildkit
rootless: "false"
runtime: containerd
worker: containerd
{{- include "buildkit.selectorLabels" . | nindent 8 }}
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: buildkit
rootless: "false"
runtime: containerd
worker: containerd
topologyKey: kubernetes.io/hostname
containers:
- args:
- --oci-worker=true
- --containerd-worker=true
- --root
- /var/lib/buildkit/buildkit
image: docker.io/moby/buildkit:buildx-stable-1
imagePullPolicy: IfNotPresent
name: buildkitd
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {{- toYaml .Values.resources | nindent 12 }}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/buildkit/
name: buildkitd-config
- mountPath: /run/containerd/containerd.sock
name: containerd-sock
- mountPath: /var/lib/buildkit/buildkit
mountPropagation: Bidirectional
name: var-lib-buildkit
- mountPath: /var/lib/containerd
mountPropagation: Bidirectional
name: var-lib-containerd
- mountPath: /run/containerd
mountPropagation: Bidirectional
name: run-containerd
- mountPath: /var/log
mountPropagation: Bidirectional
name: var-log
- mountPath: /tmp
mountPropagation: Bidirectional
name: tmp
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: buildkit
name: buildkitd-config
- hostPath:
path: /run/containerd/containerd.sock
type: Socket
name: containerd-sock
- hostPath:
path: /var/lib/buildkit/buildkit
type: DirectoryOrCreate
name: var-lib-buildkit
- hostPath:
path: /var/lib/containerd
type: Directory
name: var-lib-containerd
- hostPath:
path: /run/containerd
type: Directory
name: run-containerd
- hostPath:
path: /var/log
type: Directory
name: var-log
- hostPath:
path: /tmp
type: Directory
name: tmp
mountPath: /run/containerd/containerd.sock
This is not supported
I definitely believe the issue is related to that mount. I copied the deployment resource basically as-is from a k8s environment after running a buildkit command, so I'm not sure how this mount gets in there (or generally how the deployment resource is constructed).
Editing to add: I think this is starting to look like a kubectl-build issue, which I should probably be working to eliminate from the process now that it's redundant. It wasn't clear to me where the boundary was between these two projects.
Ok to close this issue then @brettmorien ?