local-path-provisioner
local-path-provisioner copied to clipboard
"File name too long" errors when deleting Minio volumes with huge paths
I had created a Minio instance in my k3s cluster and noticed that the helper-pod-delete-pvc
pods were failing with errors like the following (output is shortened here and below to avoid flooding the ticket, but you get the idea):
$ kubectl logs -n kube-system helper-pod-delete-pvc-af80f49d-3070-410d-a005-db1f7b60face rm: can't stat '/mnt/data/pvc-af80f49d-3070-410d-a005-db1f7b60face_walden_storage-minio-4/.minio.sys/tmp/.trash/dcf502d7-e792-479c-bee8-c6eefcf727e4/.trash/9ab0a5f2-ee1f-47cc-aba1-77560476a473/.trash/606d5f79-2fee-43e7-8bc7-8732f25674a4/.trash/[...]/f382c611-3ef5-4111-baa3-e82a190f6325/.trash/fc2f0259-32d0-47d8-829b-bc221de52b28/.trash/84aacd24-c87a-43e9-8c92-19bc0349e4a5/.trash/8f2c7ff6-131e-42b1-94e8-bcd7f971a55a/.trash/91c134f6-870f-4ca9-a8fd-fb72f7148ab5/.trash': File name too long
It looks like the root cause is due to a lower max filename lengths in busybox.
I found that I was able to reproduce the error on the host machine via busybox rm -rf
against the PVC directory:
root@pi-04:/mnt/data# busybox rm -rf pvc-af80f49d-3070-410d-a005-db1f7b60face_walden_storage-minio-4/ rm: can't stat 'pvc-af80f49d-3070-410d-a005-db1f7b60face_walden_storage-minio-4/.minio.sys/tmp/.trash/dcf502d7-e792-479c-bee8-c6eefcf727e4/.trash/[...]/f382c611-3ef5-4111-baa3-e82a190f6325/.trash/fc2f0259-32d0-47d8-829b-bc221de52b28/.trash/84aacd24-c87a-43e9-8c92-19bc0349e4a5/.trash/8f2c7ff6-131e-42b1-94e8-bcd7f971a55a/.trash/91c134f6-870f-4ca9-a8fd-fb72f7148ab5/.trash/dd2d04ea-7619-47ff-8566-50811645b0b7': File name too long
However if I use GNU rm -rf
from the same shell then the same delete works fine:
root@pi-04:/mnt/data# rm -rf pvc-af80f49d-3070-410d-a005-db1f7b60face_walden_storage-minio-4/ root@pi-04:/mnt/data# echo $? 0
From this, the solution might be to replace the use of busybox images with something else? I haven't yet dug into why the path is failing on busybox specifically but that feels like the easiest solution.
Also to be clear I don't know why Minio is creating huge paths like this but I'm consistently seeing it across deployments so it seems to be "standard". In any case it'd be better if local-path-provisioner
was able to successfully clean up the volumes under this scenario.
For reference the helper-pod-delete-pvc
definition is as follows:
$ kubectl get pod -o yaml -n kube-system helper-pod-delete-pvc-af80f49d-3070-410d-a005-db1f7b60face
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: 9e566deeb6d1596ba3a1ba49f2989919255fe896eea56469363ccbe07c97ad17
cni.projectcalico.org/podIP: ""
cni.projectcalico.org/podIPs: ""
creationTimestamp: "2022-02-09T21:13:13Z"
name: helper-pod-delete-pvc-af80f49d-3070-410d-a005-db1f7b60face
namespace: kube-system
resourceVersion: "184079915"
uid: cd06247d-88b4-498b-b918-7f03b23b1f5c
spec:
containers:
- args:
- -p
- /mnt/data/pvc-af80f49d-3070-410d-a005-db1f7b60face_walden_storage-minio-4
- -s
- "1073741824"
- -m
- Filesystem
command:
- /bin/sh
- /script/teardown
image: rancher/mirrored-library-busybox:1.32.1
imagePullPolicy: IfNotPresent
name: helper-pod
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/data
name: data
- mountPath: /script
name: script
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xwbrg
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: pi-04
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: local-path-provisioner-service-account
serviceAccountName: local-path-provisioner-service-account
terminationGracePeriodSeconds: 30
tolerations:
- operator: Exists
volumes:
- hostPath:
path: /mnt/data
type: DirectoryOrCreate
name: data
- configMap:
defaultMode: 420
items:
- key: setup
path: setup
- key: teardown
path: teardown
name: local-path-config
name: script
- name: kube-api-access-xwbrg
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-02-09T21:13:13Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-02-09T21:13:13Z"
message: 'containers with unready status: [helper-pod]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-02-09T21:13:13Z"
message: 'containers with unready status: [helper-pod]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-02-09T21:13:13Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://539fc969eb2707030a2821b84c0911099f6f414013195028ce835c1a248dae03
image: docker.io/rancher/library-busybox:1.32.1
imageID: docker.io/rancher/library-busybox@sha256:ec14ead228e6c28f2523ca6d866dd442b90ba64d447bf7b194a6fb34cc6174c8
lastState: {}
name: helper-pod
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://539fc969eb2707030a2821b84c0911099f6f414013195028ce835c1a248dae03
exitCode: 1
finishedAt: "2022-02-09T21:13:15Z"
reason: Error
startedAt: "2022-02-09T21:13:15Z"
hostIP: 172.30.1.4
phase: Failed
podIP: 172.31.141.145
podIPs:
- ip: 172.31.141.145
qosClass: BestEffort
startTime: "2022-02-09T21:13:13Z"
And the teardown
script is defined as follows:
$ kubectl get configmap -n kube-system local-path-config -o yaml
[...]
teardown: |-
#!/bin/sh
while getopts "m:s:p:" opt
do
case $opt in
p)
absolutePath=$OPTARG
;;
s)
sizeInBytes=$OPTARG
;;
m)
volMode=$OPTARG
;;
esac
done
rm -rf ${absolutePath}
[...]
Seems like it's not necessarily the fault of the busybox image itself, despite getting a busybox-specific repro in the above description. I manually configured HELPER_IMAGE=library/debian:11.2-slim
and am still seeing errors. It now says cannot remove
rather than can't stat
implying that it's indeed using GNU tools:
$ kubectl logs -n kube-system helper-pod-delete-pvc-b3bc5419-b1aa-4fc5-8396-232260d03952 rm: cannot remove '/mnt/data/pvc-b3bc5419-b1aa-4fc5-8396-232260d03952_walden_storage-minio-5/.minio.sys/tmp/.trash/62170d11-55fe-4594-a168-8fa886362e9c/.trash/1c074d65-cd2d-4260-817e-4fef2ff625a0/.trash/a766a068-4c49-4c76-8366-7fa08122c406/.trash/cda64dd2-87f6-448a-91a6-7df043eb3888/.trash/[...]/58239ddf-d9c9-463b-9205-2a8d16d9132d/.trash/32306a31-647c-4701-8352-82492bbf47ec/.trash/dcbc3939-8085-4663-8e31-b8f0df527858/.trash/8e24b082-c97e-41ec-b06d-742a03624a19/.trash/28af5f38-01bb-4fc4-83a4-8af6727353e7/.trash': File name too long
But again I was able to delete the directory just fine on the host, even switching to use the absolute path as the container is doing. It's a bit of a mystery to me what the difference is here:
root@pi-02:/home/nick# rm -rf /mnt/data/pvc-b3bc5419-b1aa-4fc5-8396-232260d03952_walden_storage-minio-5
root@pi-02:/home/nick# echo $?
0