rancher-desktop
rancher-desktop copied to clipboard
Rancher Desktop Intermittently Hangs on Ventura 13.1
Actual Behavior
When running a docker command it will hang forever. Any subsequent commands to docker in another shell hang as well. Rebooting the laptop is required as Rancher Desktop becomes unusable.
Steps to Reproduce
One dev on a M1 Mac, running Ventura 13.1 can reproduce this issue consistently by building a Dockerfile in docker. We however are unable to reproduce the same issue on our laptops consistently. One of our team members reproducing it is using a M1 Mac as well.
Create a Dockerfile
echo -e 'FROM alpine:latest\nRUN echo 'hey' > hey.txt' > Dockerfile
Build Dockerfile in docker
docker run --rm --interactive --pull="always" --user="root" --network="host" --name="repro-hanging-issue" --mount "type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock" -v "$(pwd):$(pwd)" -w "$(pwd)" docker:cli build .
Result
The terminal just hangs.
Expected Behavior
Docker commands not to hang.
Additional Information
We have had our developers start using Rancher Desktop in November 2022. It was working good, no hanging issues reported. Once people started updating to Ventura at the beginning of the month (January) they started reporting these issues. We have one developer who is able to consistently reproduce the issue, some of us can only reproduce it intermittently. Seems to be most reproducible on M1 Mac though. We were also able to reproduce it with our security tools disabled.
We enabled debug logging from the Rancher Desktop Troubleshooting page and looked at all the logs, lima and rancher and did not see any glaring errors or warnings.
If there is anything else we can provide to help this let me know.
Rancher Desktop Version
1.7.0
Rancher Desktop K8s Version
Disabled
Which container engine are you using?
moby (docker cli)
What operating system are you using?
macOS
Operating System / Build Version
Ventura 13.1
What CPU architecture are you using?
arm64 (Apple Silicon)
Linux only: what package format did you use to install Rancher Desktop?
None
Windows User Only
No response
I can't reproduce this on macOS 13.1 on M1 either). I've done a factory reset, rebooted the host, did another factory reset, and the command always worked fine.
I've looked at the logs, and can't spot anything in there either.
On the "reproducible laptop" does this also happen after a factory reset? Or after rebooting the host?
Are there any errors in any of the networking logs at ~/Library/Application Support/rancher-desktop/lima/_networks?
I am getting our IT team to send me an M1 Macbook so I can try to reproduce this issue. Another dev reported the same issue this morning. Not sure what they were doing to cause it though.
On the "reproducible laptop" it happens even after a factory reset, reboot, and fresh re-install.
The dev with the reproducible laptop needs to get some work done so they have uninstalled it for now. ~I am going to get our devs to post here when they get a freezing issue~. Meanwhile, I will try to get that laptop and re-produce it.
I am getting our IT team to send me an M1 Macbook so I can try to reproduce this issue. Another dev reported the same issue this morning. Not sure what they were doing to cause it though.
Thank you so much; this will be really helpful, as I've been unable to repro this myself.
Maybe also take a look at any anti-malware technology installed on your machines; maybe that is interfering with the virtualization code?
I have the same problem. I have tried a factory reset, reinstall, reboot everything, but rancher still hangs.
My colleagues who have the same anti-virus software installed did not have the problem.
Hi I 'm able to reproduce this frequently on my M1 running Monterrey 12.6.1/RD 1.7.0/k8s 1.25.4/Traefik disabled. What logs can I provide from ~/Library/Logs/rancher-desktop to help debug this? Currently the RD UI shows Kubernetes is running but kubectl commands timeout with Unable to connect to the server: net/http: TLS handshake timeout
Tried quitting Rancher desktop and restarting a couple of times but same problem. I could restart the laptop and the problem might go away. I may need to do that to not be blocked with my work and/or look to minikube (which doesn't have a nice UI). But happy to provide logs and keep the laptop in this reproducible state for the next 24 hours or so if it helps.

tailed logs from the time it started to the time it stopped working.
1. steve.log
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for rbac.authorization.k8s.io/v1, Kind=RoleBinding"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apiregistration.k8s.io/v1, Kind=APIService"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for /v1, Kind=Pod"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apps/v1, Kind=Deployment"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for events.k8s.io/v1, Kind=Event"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for /v1, Kind=PodTemplate"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for apps/v1, Kind=StatefulSet"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for batch/v1, Kind=CronJob"
time="2023-01-16T11:09:37-08:00" level=info msg="Watching metadata for acme.cert-manager.io/v1, Kind=Order"
…
….. first sign of trouble ….
….
2023-01-16T19:10:04.881Z: stderr: time="2023-01-16T11:10:04-08:00" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"
2023-01-16T19:13:01.329Z: stderr: W0116 11:13:01.327098 46860 reflector.go:443] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0116 11:13:01.327114 46860 reflector.go:443] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
….
…. many of these …..
….
W0116 11:13:01.328829 46860 reflector.go:443] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0116 11:13:01.328880 46860 reflector.go:443] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
….
…. TLS handshake timeouts. After this kubectl stops working roughly …..
….
2023-01-16T19:13:12.133Z: stderr: W0116 11:13:12.132748 46860 reflector.go:325] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/cert-manager.io/v1/certificates?resourceVersion=160294": net/http: TLS handshake timeout
W0116 11:13:12.132851 46860 reflector.go:325] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168: failed to list *summary.SummarizedObject: Get "https://127.0.0.1:6443/apis/node.k8s.io/v1/runtimeclasses?resourceVersion=160231": net/http: TLS handshake timeout
I0116 11:13:12.132905 46860 trace.go:205] Trace[631373749]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:168 (16-Jan-2023 11:13:02.130) (total time: 10002ms):
Trace[631373749]: ---"Objects listed" error:Get "https://127.0.0.1:6443/apis/node.k8s.io/v1/runtimeclasses?resourceVersion=160231": net/http: TLS handshake timeout 10002ms (11:13:12.132)
Trace[631373749]: [10.002143209s] [10.002143209s] END
2. k3s.log
E0117 04:26:35.226050 4290 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
W0117 04:26:36.046392 4290 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
E0117 04:26:36.046516 4290 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
{"level":"warn","ts":"2023-01-17T04:26:36.183Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x400167d880/kine.sock","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
E0117 04:26:36.183408 4290 controller.go:187] failed to update lease, error: Put "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0117 04:26:36.183651 4290 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
E0117 04:26:36.185775 4290 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
I0117 04:26:36.185091 4290 trace.go:205] Trace[333656479]: "GuaranteedUpdate etcd3" audit-id:0a94d052-49c1-40c2-a1f3-8bdacccbd6e9,key:/leases/kube-node-lease/lima-rancher-desktop,type:*coordination.Lease (17-Jan-2023 04:26:26.184) (total time: 10000ms):
Trace[333656479]: ---"Txn call finished" err:context deadline exceeded 9999ms (04:26:36.185)
Trace[333656479]: [10.000193713s] [10.000193713s] END
E0117 04:26:36.197602 4290 finisher.go:175] FinishRequest: post-timeout activity - time-elapsed: 13.941958ms, panicked: false, err: context deadline exceeded, panic-reason: <nil>
E0117 04:26:36.196928 4290 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
I0117 04:26:36.199085 4290 trace.go:205] Trace[1183966381]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop,user-agent:k3s/v1.25.4+k3s1 (linux/arm64) kubernetes/0dc6333,audit-id:0a94d052-49c1-40c2-a1f3-8bdacccbd6e9,client:127.0.0.1,accept:application/vnd.kubernetes.protobuf,application/json,protocol:HTTP/2.0 (17-Jan-2023 04:26:26.183) (total time: 10015ms):
Trace[1183966381]: ---"Write to database call finished" len:509,err:Timeout: request did not complete within requested timeout - context deadline exceeded 9998ms (04:26:36.183)
Trace[1183966381]: [10.015928213s] [10.015928213s] END
E0117 04:26:36.199699 4290 timeout.go:141] post-timeout activity - time-elapsed: 16.136125ms, PUT "/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/lima-rancher-desktop" result: <nil>
Note we have been able to avoid this hanging issue by switching to the 9p mount type in Lima. I'm not sure if it completely fixes it or makes it occur less often time will tell by our users. But my suggestion to others affected by this is to try the 9p mount. Caveat though the 9p mount does not support symlinks in volumes.
@ryancurrah how do you enable 9p? I read about it here i.e.
On macOS an alternative file sharing mechanism using 9p instead of reverse-sshfs has been implemented. It is disabled by default. Talk to us on Slack if you want to help us testing it.
But wasn't able to find the specific on how to enable it.
I have the same problem.
In detail, I and co-worker had upgraded the macOS to 13.0 but become producing it. We upgrade to 13.1, his machine recovered, but my machine was not recovered.
Finally, I had recovered by switching mountType to the 9p
Docker container had run normally with pure Lima that installed by homebrew. But mountType is null.
@lakamsani edit this file and add entry the mountType to top-level
~/Library/Application Support/rancher-desktop/lima/_config/override.yaml
I ran into the same issue too, when doing a "pnpm install" in a docker container after mounting a custom workdir into lima, on my macOS 13.1(intel). So I think this is not related to intel or M1. I can exactly reproduce this issue every time by using the same steps. And I also checked logs under rancher desktop, it seems no error(s) logged.
For me, it seems "hang" only occures when using default mountType(should be null, from ~/Library/Application Support/rancher-desktop/lima/0/lima.yaml), and run some npm install commadn inside a docker container with -v custom volmue mount. I also wrote a dockerfile to do almost the same thing to test but the problem disappered. Finally I changed lima mountType to 9p and everyting seems to be ok now.
After upgrading to Ventura 13.2 coming from 12.x. I never ran into this problem on 12.x
I'm running into the same issue. I'm doing a massive amount of file activity along with network inside a container. The IO get's hung, which then docker ps becomes unresponsive. I try to quit the desktop which hangs, to get it to quit properly:
ps auxww |grep rancher | grep ssh |awk '{print $2}' | xargs kill
On restart, qemu looks like it comes up properly, but the docker socket is unresponsive still. A second quit and restart works fine. I guess I'll try the 9p thing. I don't have an override.yaml, so I'm assuming it should look like:
---
mountType: 9p
--- mountType: 9p
Answered my own question:
cat ~/"Library/Application Support/rancher-desktop/lima/_config/override.yaml"
---
mountType: 9p
ps auxww |grep rancher | grep ssh shows nothing now while using disk io
Hello, experiencing same issue, but on intel CPU and macOS Ventura....FYI
Hello, experiencing same issue, but on intel CPU and macOS Ventura....FYI
I should have clarified that, I’m on intel as well. The 9p made a huge difference.
Unfortunately for me the 9p caused other issues so it's unusable for me.
update: upgraded to Ventura 13.2 and don't have the "freezing" problem anymore without any override...
Meet the same hang problem on 13.2 on Intel mac, docker freezing, can't quick rancher-desktop.
Meet the same hang problem on 13.2 on Intel mac, docker freezing, can't quick rancher-desktop.
I’m a terminal do a ps and grep for rancher. You will see a bunch of ssh sessions kill them off and your rancher will become responsive. Once made change to 9p all these hang issues went away.
I’m a terminal do a ps and grep for rancher. You will see a bunch of ssh sessions kill them off and your rancher will become responsive. Once made change to 9p all these hang issues went away.
Thanks, after adding a new override.yaml, it work for me!
cat ~/Library/Application\ Support/rancher-desktop/lima/_config/override.yaml
---
mountType: 9p
I have been experiencing a similar problem on and off for the past month or two. Was originally discussing in the rancher-desktop slack channel, but after finding this issue I believe it's the same as what I'm experiencing.
I find the bug to be easily reproducible in my case: Rancher Desktop: 1.8.1 macOS: Ventura 13.1 Container runtime: dockerd (moby) [I have not tested recently with containerd/nerdctl - will try this] Rancher kubernetes: disabled (doesn't matter; I've seen this issue with k8s enabled as well)
I get the same behavior as described above, existing containers freeze and virtually all commands hang (docker ps, docker image ls, rdctl shell, nothing works except simple stuff like docker version).
Here is what I can note about reproducing the problem (at least in my case):
- Only happens when running multiple containers simultaneously
- Containers are running terraform provisioning via ansible (IO/network usage) in interactive mode (
docker run -it) with a few env vars passed in (probably not relevant) - Each container has multiple volumes mounted, but I am careful to never mount the same host volume with read/write to two different containers (sometimes I mount the same volume to multiple containers in read-only)
- I increased the RAM allowance for the rancher VM all the way up to 16GB, but this did not help (I have verified that my machine RAM is not being used up either; plenty of capacity left)
About the suggested workaround:
- I did attempt the
mountType: 9pworkaround - it did successfully prevent the container runtime from hanging; however, it caused my terraform provider to fatally crash (everytime), so this method is unusable for me.
Same here: Rancher Version: 1.9.1 Ventura 13.4.1 (c)
Likewise, Rancher Desktop randomly freezes for me, more often-than-not after I leave it running without use for a while, and most nerdctl nor rdctl commands will respond until I restart the application (tearing down the VM, etc.).
I'm currently on Rancher Desktop 1.9.1 & on macOS Ventura 13.5.1, running on Apple silicon (M2 Pro). I don't have Kubernetes enabled, and I'm using the containerd runtime, with VZ emulation (Rosetta support enabled) & virtiofs mounting (I did have other types of problems before when using 9p, mostly related to user mappings & permissions, so I'd like to avoid going back to that, and reverse-sshfs was unbearably slow!).
Let me know if you'd like me to gather any information when RD hangs, for debugging purposes. Thanks!
Same issue here. Exactly same environment as @juanpalaciosascend (but M1 pro)
Same for me, factory reset did fix it for me though.
Factory reset fixes because it probably sets back to QEMU, reverse-sshfs, ... but if you try to apply those settings mentioned (VZ, virtiofs, ...) back, probably problem will come back.
I've seen most of the problems I've been experiencing go away... I want to say entirely, but it might be still a little bit too early for that, when switching back to the dockerd (moby) runtime, away from containerd.
All other settings (e.g. VZ framework, Rosetta support enabled, virtiofs volumes, Kubernetes disabled, etc.) remain the same, so that leads me to believe the problem that's causing Rancher Desktop to freeze revolves around the use of containerd.