k3s
k3s copied to clipboard
ImageVolume broken
Environmental Info: K3s Version: k3s version v1.31.0+k3s1 (34be6d96) go version go1.22.5
Node(s) CPU architecture, OS, and Version: Linux kubetest 6.8.0-44-generic #44-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 13 13:35:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: Single node
Describe the bug: The freshly arrived ImageVolume feature seems broken
Steps To Reproduce:
Deployed k3s 1.31 with:
INSTALL_K3S_EXEC="--kubelet-arg feature-gates=ImageVolume=true --kube-apiserver-arg feature-gates=ImageVolume=true"
INSTALL_K3S_CHANNEL=latest
Applied manifest:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
volumeMounts:
- name: model
mountPath: /model
volumes:
- name: model
image:
pullPolicy: IfNotPresent
reference: registry.k8s.io/conformance:v1.31.0
Expected behavior:
Pod starts with registry.k8s.io/conformance:v1.31.0 image contents mounted at /model
Actual behavior:
In Kubernetes Lens following error message is shown
Error: failed to generate container "558bb3f947d9e1a70c697143ba10b2795c58ee7bde9c4fa0e3b317acb02dbe7b" spec: failed to generate spec: failed to mkdir "": mkdir : no such file or directory
Additional context / logs:
Sep 12 07:33:18 kubetest k3s[2270]: E0912 07:33:18.926553 2270 log.go:32] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to generate container \"536ef130eb99ae8a9f69c175a410b6400a576c514f07ea5faf68b6dc638a8860\" spec: failed to generate spec: failed to mkdir \"\": mkdir : no such file or directory" podSandboxID="f785734aa1099aa8557e8d4e6cabe4de5d889901d31880caf2a3f972946d7f78"
Sep 12 07:33:18 kubetest k3s[2270]: E0912 07:33:18.926664 2270 kuberuntime_manager.go:1272] "Unhandled Error" err="container &Container{Name:nginx,Image:nginx:1.14.2,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:0,ContainerPort:80,Protocol:TCP,HostIP:,},},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:model,ReadOnly:false,MountPath:/model,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:kube-api-access-rjk5r,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod ai-test-67984f78cd-n8xqd_default(e55f166a-1f0f-4728-96e2-fde67da551e3): CreateContainerError: failed to generate container \"536ef130eb99ae8a9f69c175a410b6400a576c514f07ea5faf68b6dc638a8860\" spec: failed to generate spec: failed to mkdir \"\": mkdir : no such file or directory"
Sep 12 07:33:18 kubetest k3s[2270]: E0912 07:33:18.927733 2270 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nginx\" with CreateContainerError: \"failed to generate container \\\"536ef130eb99ae8a9f69c175a410b6400a576c514f07ea5faf68b6dc638a8860\\\" spec: failed to generate spec: failed to mkdir \\\"\\\": mkdir : no such file or directory\"" pod="default/ai-test-67984f78cd-n8xqd" podUID="e55f166a-1f0f-4728-96e2-fde67da551e3"
I don't know where this error is coming from, but it is not code that lives within this repo. I suspect that the problem is either in the kubelet, or containerd.
The error itself is coming back from containerd (the runtime service), but I don't know if the container spec generated by the kubelet is incorrect, or if there's a bug in containerd, or what.
Have you tried this with any other Kubernetes distro with containerd 1.7, or k3s with a different container runtime (cri-dockerd perhaps)?
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Could I request to reopen this issue? Here is a very simple reproducer:
Create a simple cluster with K3d:
k3d cluster create playground \
--image rancher/k3s:v1.31.2-k3s1 \
--k3s-arg '--kube-apiserver-arg=feature-gates=ImageVolume=true@server:*' \
--k3s-arg '--kubelet-arg=feature-gates=ImageVolume=true@server:*'
Watch events in a second terminal:
k get events -w
Create a pod
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: readonly-oci-volume-pod
spec:
containers:
- name: test
image: registry.k8s.io/e2e-test-images/echoserver:2.3
volumeMounts:
- name: volume
mountPath: /volume
volumes:
- name: volume
image:
reference: busybox:latest
pullPolicy: IfNotPresent
EOF
The event log will contain:
LAST SEEN TYPE REASON OBJECT MESSAGE
17s Normal Starting node/k3d-playground-server-0 Starting kubelet.
17s Warning InvalidDiskCapacity node/k3d-playground-server-0 invalid capacity 0 on image filesystem
17s Normal NodeHasSufficientMemory node/k3d-playground-server-0 Node k3d-playground-server-0 status is now: NodeHasSufficientMemory
17s Normal NodeHasNoDiskPressure node/k3d-playground-server-0 Node k3d-playground-server-0 status is now: NodeHasNoDiskPressure
17s Normal NodeHasSufficientPID node/k3d-playground-server-0 Node k3d-playground-server-0 status is now: NodeHasSufficientPID
17s Normal NodeAllocatableEnforced node/k3d-playground-server-0 Updated Node Allocatable limit across pods
17s Normal Starting node/k3d-playground-server-0
17s Normal NodeReady node/k3d-playground-server-0 Node k3d-playground-server-0 status is now: NodeReady
14s Normal Synced node/k3d-playground-server-0 Node synced successfully
14s Normal NodePasswordValidationComplete node/k3d-playground-server-0 Deferred node password secret validation complete
13s Normal RegisteredNode node/k3d-playground-server-0 Node k3d-playground-server-0 event: Registered Node k3d-playground-server-0 in Controller
0s Normal Scheduled pod/readonly-oci-volume-pod Successfully assigned default/readonly-oci-volume-pod to k3d-playground-server-0
0s Normal Pulling pod/readonly-oci-volume-pod Pulling image "busybox:latest"
0s Normal Pulled pod/readonly-oci-volume-pod Successfully pulled image "busybox:latest" in 2.899s (2.899s including waiting). Image size: 2166802 bytes.
0s Normal Pulling pod/readonly-oci-volume-pod Pulling image "registry.k8s.io/e2e-test-images/echoserver:2.3"
0s Normal Pulled pod/readonly-oci-volume-pod Successfully pulled image "registry.k8s.io/e2e-test-images/echoserver:2.3" in 4.995s (4.995s including waiting). Image size: 8478041 bytes.
0s Warning Failed pod/readonly-oci-volume-pod Error: failed to generate container "767e400ec7f3f654630fe92d33d701a01f2426f9e24718a8a7840eed6e5786cd" spec: failed to generate spec: failed to mkdir "": mkdir : no such file or directory
...<last 3 lines repeat>
As I asked above, are you sure containerd even supports this? It's still alpha, and the upstream announcement only mentions it working with the latest releases of cri-o. https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/
The Kubernetes alpha feature gate ImageVolume needs to be enabled on the API Server as well as the kubelet to make it functional. If that’s the case and the container runtime has support for the feature (like CRI-O ≥ v1.31), then an example pod.yaml like this can be created:
I suspect you'll need to wait for containerd to support this, or install and use cri-o instead of the containerd bundled with k3s.
Ok, That's fair. I misunderstood. thanks for checking :)
for posterity, the containerd line that throws the error, I believe, is this one: https://github.com/containerd/containerd/blob/0aa8b58092a849543dee7680005bf33503a71bec/internal/cri/opts/spec_linux_opts.go#L114
containerd issue for this:
https://github.com/containerd/containerd/issues/10496
Question. Is k3s using this repo as the source of its containerd?
sorry, --> https://github.com/k3s-io/containerd
I think this answers my question https://github.com/k3s-io/k3s/blob/master/go.mod#L12
We maintain a small number of patches on top of containerd to allow embedding it within k3s and enable a couple additional snapshotters. Our rewrite support has also not been accepted by upstream.
https://github.com/k3s-io/containerd/compare/v1.7.23...v1.7.23-k3s1
@brandond could you point me to the rewrite support so I can see what exactly that is? I'm trying to look to get my hands dirty. Sometimes a solution to an upstream project not accepting a particular feature is offering a change instead that enables the feature to be implemented through some form of extensibility.
Did you look at the commits in the link in the message you just replied to?
.. right, sorry. long day! 😓
Hi, it is now implemented in containerd and should come in containerd 2.1.0 (actually in beta): https://github.com/containerd/containerd/pull/11510
This PR should address this:
- https://github.com/k3s-io/k3s/pull/12788
Seems to be working with k3s 1.33.5 by enabling the gate flags