rancher-desktop icon indicating copy to clipboard operation
rancher-desktop copied to clipboard

MacOS DNS regression in 1.5.0 and 1.5.1

Open ryfow opened this issue 2 years ago • 1 comments

Actual Behavior

With Rancher Desktop 1.5.{0,1} on aarch64 MacOS, I'm seeing qemu-system-aarch64 hang for several minutes at a time in my development environment. The problem appears to be triggered by a process making bursty DNS requests for host.docker.internal. The same development environment works fine on Rancher Desktop 1.4.1.

Steps to Reproduce

This isn't how I found the problem, but I think it reproduces the same underlying issue.

  1. Install Rancher Desktop 1.5.1 and configure in Docker/moby mode.
  2. Run docker run --rm --name crashy-crashy -ti ubuntu:20.04 bash -c 'apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y dnsutils psmisc && while true ; do dig host.docker.internal ; done'
  3. Wait for the crashy-crashy container to start logging dig output
  4. Run the following command in another host terminal: docker exec -ti crashy-crashy bash -c "while true ; do killall dig ; sleep .1 ; done"
  5. Wait for a bit, you should see the dig output eventually stop, qemu-system-aarch64 will be running at 100% CPU on your MacOS host, and docker commands will no longer work.

Result

I'm seeing the Rancher Desktop qemu VM become unresponsive until I kill the qemu-system-aarch64 process and restart Rancher Desktop.

Expected Behavior

The Rancher Desktop VM should not hang.

Additional Information

No response

Rancher Desktop Version

1.5.1

Rancher Desktop K8s Version

N/A

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

MacOS Monterey 12.5.1

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

N/A

Windows User Only

No response

ryfow avatar Aug 23 '22 20:08 ryfow

FWIW, I ran my reproduction steps on a second Macbook, and the see the same behavior.

ryfow avatar Aug 25 '22 00:08 ryfow

FWIW, I ran my reproduction steps on a second Macbook, and they see the same behavior.

@ryfow is the second Macbook also a M1? or x86?

@jandubois do you think you can reproduce this on your M1 machine?

Nino-K avatar Nov 15 '22 19:11 Nino-K

Hi, I'm facing to a similar issue in 1.7.0. It seems like Lima is stuck on the file descriptor limit, but I haven't found a way to solve it yet. This issue also occurs on x86 macOS.

Steps to Reproduce

  1. Login to Lima and keep running nslookup.
$ rdctl shell
lima-rancher-desktop:/Users/xxx$ while true; do nslookup www.google.co.jp; done
  1. On host OS, show a list of UDP open files that qemu-system-aarch64 handles.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
qemu-syst 6788 xxxx  119u  IPv4 0x2c6ecf140850ff5f         0t0                 UDP *:63544
qemu-syst 6788 xxxx  120u  IPv4 0x2c6ecf140851762f         0t0                 UDP *:63398
  1. A number of UDP open files are keep increasing and after it reaches to FD=1024u, Lima get stuck.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
...
qemu-syst 6788 xxxx  1023u  IPv4 0x2c6ecf14085191bf         0t0                 UDP *:54486
qemu-syst 6788 xxxx  1024u  IPv4 0x2c6ecf140852088f         0t0                 UDP *:62934
  1. If you wait exactly 4 minutes, all UDP open files get released and Lima starts running again.

Rancher Desktop Version

1.4.1, 1.6.2, 1.7.0

Rancher Desktop K8s Version

N/A

Which container engine are you using?

moby (docker cli)

Operating System / Build Version / CPU

MacOS Monterey 12.6 (M1 2020) MacOS Ventura 13.0.1 (Intel Core i5, 2019)

matsukaz avatar Dec 19 '22 09:12 matsukaz

This Issue may be a problem about Alpine Linux. I tried it with Lima and got the same problem, also with Debian, but not with Ubuntu.

I used the following images.

  • Alpine Linux
    • https://github.com/lima-vm/alpine-lima/releases/download/v0.2.26/alpine-lima-std-3.17.0-aarch64.iso
  • Ubuntu
    • https://cloud-images.ubuntu.com/releases/22.10/release-20221201/ubuntu-22.10-server-cloudimg-arm64.img
  • Debian
    • https://cloud.debian.org/images/cloud/bullseye/20221205-1220/debian-11-generic-arm64-20221205-1220.qcow2

matsukaz avatar Dec 19 '22 15:12 matsukaz

@ryfow the issue has been addressed here: https://github.com/lima-vm/lima/issues/1285, therefore, it should be included in our upcoming release. Thank you again for reporting this.

Nino-K avatar Jan 11 '23 20:01 Nino-K

Awesome! Looking forward to upgrading from 1.4.1 :)

ryfow avatar Jan 11 '23 21:01 ryfow

I'm going to close this since all the changes are in place now, @ryfow and @matsukaz please keep your eyes on our next release and give it a try. Feel free to re-open if you encounter anything additional. Thanks

Nino-K avatar Jan 12 '23 20:01 Nino-K

@Nino-K This appears to still be a problem with Rancher Desktop 1.8. I don't know for sure if the same thing is making my dev environment hang, but I think it's the most likely suspect.

Edit: I can't figure out how to reopen.

ryfow avatar Mar 22 '23 17:03 ryfow

@Nino-K At least in my environment, this issue was resolved with Rancher Desktop 1.8! I have not seen this issue since I upgraded to 1.8, even with the reproduction procedure I posted earlier.

@ryfow I'm not sure but t maybe an another problem.

matsukaz avatar Mar 22 '23 23:03 matsukaz

I tried my original reproduction steps with 1.8.1 on a work M1 Macbook and a personal M1 Macbook. It's hangs on both and puts qemu into 100% CPU usage.

ryfow avatar Mar 24 '23 13:03 ryfow

@ryfow could your issue possibly be related to this one? https://github.com/lima-vm/lima/issues/1333

Nino-K avatar Apr 03 '23 17:04 Nino-K

@Nino-K I don't think it's https://github.com/lima-vm/lima/issues/1333. That bug appears to be talking about Virtualization.Framework. Looks like Rancher Desktop uses qemu.

I tried to follow my reproduction steps on lima 0.15, qemu 7.2.1 and limactl start --name docker template:///docker. I couldn't reproduce, the hang did not happen.

The qemu version is different, so I tried copying my system version of qemu-system-aarch64 (7.2.1) into the "Rancher Desktop.app" but that did not help. I still see the hang on Rancher Desktop with the new qemu.

ryfow avatar Apr 14 '23 22:04 ryfow

It's got to be a problem with https://github.com/lima-vm/alpine-lima. When I start a VM with limactl start --name alpine template://alpine the problem reproduces.

ryfow avatar Apr 14 '23 22:04 ryfow

As an FYI to anyone else running into this, I've had good results with using the VZ Virtual Machine Type. Things seem way more stable.

ryfow avatar Jan 16 '24 15:01 ryfow