rancher-desktop icon indicating copy to clipboard operation
rancher-desktop copied to clipboard

Image pulls initiated by k3s are subject to a 2 minute timeout

Open alubbe opened this issue 2 years ago • 5 comments

Actual Behavior

Image pulls initiated by K8S result in ImagePullBackoff if the download does not complete within 2 minutes. The image pull is retried, but the pod will stay in this status forever if retries last longer than 2 minutes.

Steps to Reproduce

This test pod uses a large image:

apiVersion: v1
kind: Pod
metadata:
  name: splunktest
spec:
  containers:
  - name: splunktest
    image: splunk/splunk
    env:
    - name: SPLUNK_START_ARGS
      value: --accept-license
    - name: SPLUNK_PASSWORD
      value: password
  1. Save the above yaml to a file such as test.yaml
  2. Run kubectl apply -f test.yaml
  3. If the connection is slow enough, the pod will enter ImagePullBackoff state after 2 minutes
  4. Notice that docker pull splunk/splunk will succeed, even if it takes longer than 2 minutes

Result

When you're developing locally and from your home office, your internet may not be fast enough to download multi-gigabytes docker images in under 2 minutes. And for smaller images, you're subject to a shaky internet connection (family might use netflix or torrents). The current 2 minute time limit results in a very bad user experience.

Expected Behavior

Since Rancher Desktop is intended for local development, image pulls initiated by K8S should succeed even if they take longer than 2 minutes.

Additional Information

The same issue affects Docker for Desktop (see https://github.com/docker/for-mac/issues/6300)

It can be fixed by changing the kubelet configuration, specifically by increasing the runtimeRequestTimeout setting documented at https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

There are three potential solutions:

  1. Find a new global default (like increasing it to 10 minutes or similar, but this might have unintended consequences)
  2. Make this number configurable by the user via the UI
  3. Give users the ability to modify the kubelet config file on disk and change k3s to use this file when it exists

Rancher Desktop Version

1.3.0

Rancher Desktop K8s Version

v1.20.15

Which container runtime are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

macOS Monterey 12.3

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

alubbe avatar May 31 '22 08:05 alubbe

You should be able to configure this right now by setting the K3S_EXEC variable in an override.yaml file:

$ cat ~/Library/Application\ Support/rancher-desktop/lima/_config/override.yaml
env:
  K3S_EXEC: "--kubelet-arg=runtime-request-timeout=10m0s"

When you restart Rancher Desktop you can verify that the additional parameter has been passed to kubelet (close to the end of the 1293 character log line):

$ rdctl shell grep request-timeout /var/log/k3s.log
time="2022-06-01T05:36:21Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/31ff0fd447a47323a7c863dbb0a3cd452e12b45f1ec67dc55efa575503c2c3ac/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=lima-rancher-desktop --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --network-plugin=cni --node-ip=192.168.18.119 --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --runtime-request-timeout=10m0s --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"

I have not actually tested if this affects the image pull timeout, but it should according to the documentation.

Let me know if this works for you! 😄

jandubois avatar Jun 01 '22 05:06 jandubois

Hey Jan, sounds like you already support my approach 3 - that's great!

Your approach passes the parameter to kubelet:

time="2022-06-07T09:21:52.902549467Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/77ca12849da9a6f82acce910d05b017c21a69b14f02ef84177a7f6aaa1265e82/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/cri-dockerd.sock --container-runtime=remote --containerd=/run/cri-dockerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=lima-rancher-desktop --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --network-plugin=cni --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --runtime-request-timeout=10m0s --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"

But it doesn't solve the issue, it still stops pulling images after 2 minutes.

The next lines in the logfile are these:

Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
I0607 10:11:59.706326    3955 server.go:412] Version: v1.20.15+k3s1
W0607 10:11:59.707340    3955 server.go:226] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.

Could it be that we need to use a config file rather than cli flags? If so, how can I do that with Rancher Desktop?

Btw, the cli flag is also deprecated according to https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

alubbe avatar Jun 07 '22 10:06 alubbe

Just to clarify, since my last comment is a bit bloated - if I want to use the --config file, as kubelet wants me to, where should I create that file so that it gets injected and found at runtime?

alubbe avatar Jun 10 '22 08:06 alubbe

yes how do we use this "--runtime-request-timeout" in k3s rancher desktop when dockerd is selected

ripun avatar Jul 11 '22 13:07 ripun

Same problem.

t80027t avatar Oct 11 '22 15:10 t80027t

Same problem. The only workaround I have found is to pull the image separately using "docker pull ".

jontunjon avatar Nov 03 '22 15:11 jontunjon

Same problem.

gmanera avatar Nov 03 '22 19:11 gmanera

Same problem

vinibodruch avatar Nov 03 '22 20:11 vinibodruch

We are also running into this - we have a few large images and people in HO. In Minikube, you can set the kubelet parameter "runtime-request-timeout", which fixes it for us, but some folks would rather use Rancher Desktop. Any chance to set this conveniently?

jgoeres avatar Dec 13 '22 07:12 jgoeres

There is a closed issue for this in k3s https://github.com/k3s-io/k3s/issues/6482. Seems the only option is to use containerd instead of dockerd.

ryancurrah avatar May 26 '23 19:05 ryancurrah