rancher-desktop
rancher-desktop copied to clipboard
Image pulls initiated by k3s are subject to a 2 minute timeout
Actual Behavior
Image pulls initiated by K8S result in ImagePullBackoff if the download does not complete within 2 minutes. The image pull is retried, but the pod will stay in this status forever if retries last longer than 2 minutes.
Steps to Reproduce
This test pod uses a large image:
apiVersion: v1
kind: Pod
metadata:
name: splunktest
spec:
containers:
- name: splunktest
image: splunk/splunk
env:
- name: SPLUNK_START_ARGS
value: --accept-license
- name: SPLUNK_PASSWORD
value: password
- Save the above yaml to a file such as test.yaml
- Run
kubectl apply -f test.yaml
- If the connection is slow enough, the pod will enter ImagePullBackoff state after 2 minutes
- Notice that docker pull splunk/splunk will succeed, even if it takes longer than 2 minutes
Result
When you're developing locally and from your home office, your internet may not be fast enough to download multi-gigabytes docker images in under 2 minutes. And for smaller images, you're subject to a shaky internet connection (family might use netflix or torrents). The current 2 minute time limit results in a very bad user experience.
Expected Behavior
Since Rancher Desktop is intended for local development, image pulls initiated by K8S should succeed even if they take longer than 2 minutes.
Additional Information
The same issue affects Docker for Desktop (see https://github.com/docker/for-mac/issues/6300)
It can be fixed by changing the kubelet configuration, specifically by increasing the runtimeRequestTimeout
setting documented at https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
There are three potential solutions:
- Find a new global default (like increasing it to 10 minutes or similar, but this might have unintended consequences)
- Make this number configurable by the user via the UI
- Give users the ability to modify the kubelet config file on disk and change k3s to use this file when it exists
Rancher Desktop Version
1.3.0
Rancher Desktop K8s Version
v1.20.15
Which container runtime are you using?
moby (docker cli)
What operating system are you using?
macOS
Operating System / Build Version
macOS Monterey 12.3
What CPU architecture are you using?
x64
Linux only: what package format did you use to install Rancher Desktop?
No response
Windows User Only
No response
You should be able to configure this right now by setting the K3S_EXEC
variable in an override.yaml
file:
$ cat ~/Library/Application\ Support/rancher-desktop/lima/_config/override.yaml
env:
K3S_EXEC: "--kubelet-arg=runtime-request-timeout=10m0s"
When you restart Rancher Desktop you can verify that the additional parameter has been passed to kubelet (close to the end of the 1293 character log line):
$ rdctl shell grep request-timeout /var/log/k3s.log
time="2022-06-01T05:36:21Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/31ff0fd447a47323a7c863dbb0a3cd452e12b45f1ec67dc55efa575503c2c3ac/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=lima-rancher-desktop --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --network-plugin=cni --node-ip=192.168.18.119 --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --runtime-request-timeout=10m0s --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
I have not actually tested if this affects the image pull timeout, but it should according to the documentation.
Let me know if this works for you! 😄
Hey Jan, sounds like you already support my approach 3 - that's great!
Your approach passes the parameter to kubelet:
time="2022-06-07T09:21:52.902549467Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/77ca12849da9a6f82acce910d05b017c21a69b14f02ef84177a7f6aaa1265e82/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/cri-dockerd.sock --container-runtime=remote --containerd=/run/cri-dockerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=lima-rancher-desktop --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --network-plugin=cni --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --runtime-request-timeout=10m0s --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
But it doesn't solve the issue, it still stops pulling images after 2 minutes.
The next lines in the logfile are these:
Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
I0607 10:11:59.706326 3955 server.go:412] Version: v1.20.15+k3s1
W0607 10:11:59.707340 3955 server.go:226] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
Could it be that we need to use a config file rather than cli flags? If so, how can I do that with Rancher Desktop?
Btw, the cli flag is also deprecated according to https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
Just to clarify, since my last comment is a bit bloated - if I want to use the --config
file, as kubelet wants me to, where should I create that file so that it gets injected and found at runtime?
yes how do we use this "--runtime-request-timeout" in k3s rancher desktop when dockerd is selected
Same problem.
Same problem. The only workaround I have found is to pull the image separately using "docker pull
Same problem.
Same problem
We are also running into this - we have a few large images and people in HO. In Minikube, you can set the kubelet parameter "runtime-request-timeout", which fixes it for us, but some folks would rather use Rancher Desktop. Any chance to set this conveniently?
There is a closed issue for this in k3s https://github.com/k3s-io/k3s/issues/6482. Seems the only option is to use containerd instead of dockerd.