rancher-desktop
rancher-desktop copied to clipboard
MacOS DNS regression in 1.5.0 and 1.5.1
Actual Behavior
With Rancher Desktop 1.5.{0,1} on aarch64 MacOS, I'm seeing qemu-system-aarch64 hang for several minutes at a time in my development environment. The problem appears to be triggered by a process making bursty DNS requests for host.docker.internal. The same development environment works fine on Rancher Desktop 1.4.1.
Steps to Reproduce
This isn't how I found the problem, but I think it reproduces the same underlying issue.
- Install Rancher Desktop 1.5.1 and configure in Docker/moby mode.
- Run
docker run --rm --name crashy-crashy -ti ubuntu:20.04 bash -c 'apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y dnsutils psmisc && while true ; do dig host.docker.internal ; done'
- Wait for the crashy-crashy container to start logging
dig
output - Run the following command in another host terminal:
docker exec -ti crashy-crashy bash -c "while true ; do killall dig ; sleep .1 ; done"
- Wait for a bit, you should see the dig output eventually stop, qemu-system-aarch64 will be running at 100% CPU on your MacOS host, and docker commands will no longer work.
Result
I'm seeing the Rancher Desktop qemu VM become unresponsive until I kill the qemu-system-aarch64 process and restart Rancher Desktop.
Expected Behavior
The Rancher Desktop VM should not hang.
Additional Information
No response
Rancher Desktop Version
1.5.1
Rancher Desktop K8s Version
N/A
Which container engine are you using?
moby (docker cli)
What operating system are you using?
macOS
Operating System / Build Version
MacOS Monterey 12.5.1
What CPU architecture are you using?
arm64 (Apple Silicon)
Linux only: what package format did you use to install Rancher Desktop?
N/A
Windows User Only
No response
FWIW, I ran my reproduction steps on a second Macbook, and the see the same behavior.
FWIW, I ran my reproduction steps on a second Macbook, and they see the same behavior.
@ryfow is the second Macbook also a M1
? or x86
?
@jandubois do you think you can reproduce this on your M1 machine?
Hi, I'm facing to a similar issue in 1.7.0.
It seems like Lima is stuck on the file descriptor limit, but I haven't found a way to solve it yet.
This issue also occurs on x86
macOS.
Steps to Reproduce
- Login to Lima and keep running nslookup.
$ rdctl shell
lima-rancher-desktop:/Users/xxx$ while true; do nslookup www.google.co.jp; done
- On host OS, show a list of UDP open files that qemu-system-aarch64 handles.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
qemu-syst 6788 xxxx 119u IPv4 0x2c6ecf140850ff5f 0t0 UDP *:63544
qemu-syst 6788 xxxx 120u IPv4 0x2c6ecf140851762f 0t0 UDP *:63398
- A number of UDP open files are keep increasing and after it reaches to
FD=1024u
, Lima get stuck.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
...
qemu-syst 6788 xxxx 1023u IPv4 0x2c6ecf14085191bf 0t0 UDP *:54486
qemu-syst 6788 xxxx 1024u IPv4 0x2c6ecf140852088f 0t0 UDP *:62934
- If you wait exactly 4 minutes, all UDP open files get released and Lima starts running again.
Rancher Desktop Version
1.4.1, 1.6.2, 1.7.0
Rancher Desktop K8s Version
N/A
Which container engine are you using?
moby (docker cli)
Operating System / Build Version / CPU
MacOS Monterey 12.6 (M1 2020) MacOS Ventura 13.0.1 (Intel Core i5, 2019)
This Issue may be a problem about Alpine Linux. I tried it with Lima and got the same problem, also with Debian, but not with Ubuntu.
I used the following images.
- Alpine Linux
- https://github.com/lima-vm/alpine-lima/releases/download/v0.2.26/alpine-lima-std-3.17.0-aarch64.iso
- Ubuntu
- https://cloud-images.ubuntu.com/releases/22.10/release-20221201/ubuntu-22.10-server-cloudimg-arm64.img
- Debian
- https://cloud.debian.org/images/cloud/bullseye/20221205-1220/debian-11-generic-arm64-20221205-1220.qcow2
@ryfow the issue has been addressed here: https://github.com/lima-vm/lima/issues/1285, therefore, it should be included in our upcoming release. Thank you again for reporting this.
Awesome! Looking forward to upgrading from 1.4.1 :)
I'm going to close this since all the changes are in place now, @ryfow and @matsukaz please keep your eyes on our next release and give it a try. Feel free to re-open if you encounter anything additional. Thanks
@Nino-K This appears to still be a problem with Rancher Desktop 1.8. I don't know for sure if the same thing is making my dev environment hang, but I think it's the most likely suspect.
Edit: I can't figure out how to reopen.
@Nino-K At least in my environment, this issue was resolved with Rancher Desktop 1.8! I have not seen this issue since I upgraded to 1.8, even with the reproduction procedure I posted earlier.
@ryfow I'm not sure but t maybe an another problem.
I tried my original reproduction steps with 1.8.1 on a work M1 Macbook and a personal M1 Macbook. It's hangs on both and puts qemu into 100% CPU usage.
@ryfow could your issue possibly be related to this one? https://github.com/lima-vm/lima/issues/1333
@Nino-K I don't think it's https://github.com/lima-vm/lima/issues/1333. That bug appears to be talking about Virtualization.Framework. Looks like Rancher Desktop uses qemu.
I tried to follow my reproduction steps on lima 0.15, qemu 7.2.1 and limactl start --name docker template:///docker
. I couldn't reproduce, the hang did not happen.
The qemu version is different, so I tried copying my system version of qemu-system-aarch64 (7.2.1) into the "Rancher Desktop.app" but that did not help. I still see the hang on Rancher Desktop with the new qemu.
It's got to be a problem with https://github.com/lima-vm/alpine-lima. When I start a VM with limactl start --name alpine template://alpine
the problem reproduces.
As an FYI to anyone else running into this, I've had good results with using the VZ Virtual Machine Type. Things seem way more stable.