rancher-desktop icon indicating copy to clipboard operation
rancher-desktop copied to clipboard

DNS over UDP fails 50% of the time in containers on MacOS

Open codyps opened this issue 1 year ago • 2 comments

Actual Behavior

Runs something like dig +short raw.githubusercontent.com +notcp, it fails with a timeout every other request. This also effects things like curl, causing them to fail to resolve 50% of the time. When using curl, I observe every-other request for the same url returning an error.

Example curl output on error:

# curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh
curl: (6) Could not resolve host: raw.githubusercontent.com

Note: the next execution of the same command succeeded.

dig output:

root@b505e598c74d:/# dig +short raw.githubusercontent.com +notcp +time=1 +tries=1
185.199.111.133
185.199.110.133
185.199.109.133
185.199.108.133
root@b505e598c74d:/# dig +short raw.githubusercontent.com +notcp +time=1 +tries=1
;; communications error to 192.168.5.3#53: timed out
;; no servers could be reached

root@b505e598c74d:/# 

Overriding the dns server so it isn't the /etc/resolv.conf nameserver 192.168.5.3, but instead something like 8.8.8.8 seems to resolve the issue.

I've been testing in a docker run -it --rm debian:bookworm docker image, but the same appears to occur in the lima vm directly:

After entering the vm with LIMA_HOME=~/Library/Application\ Support/rancher-desktop/lima limactl shell 0

lima-rancher-desktop:~$ while true; do curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh -o /dev/null && echo OK 
|| echo BAD; done
OK
curl: (6) Could not resolve host: raw.githubusercontent.com
BAD
OK
curl: (6) Could not resolve host: raw.githubusercontent.com
BAD
OK
^C
lima-rancher-desktop:~$ 

Steps to Reproduce

  1. Enter the lima vm and note that curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh fails half of the time.

Result

DNS resolution failures when using the default DNS host around 50% of the time.

Expected Behavior

DNS resolution succeeds

Additional Information

  • Forcing dig to use tcp with a +tcp makes the resolution always succeed.
  • dig from macos (and other dns resolution) works without issue

This is on a corporate laptop with a bunch of network filtering/vpn items, and it's possible this issue is triggered by one of them.

In MacOS Settings under "Network" -> "VPN & Filters" -> "Filters & Proxies", these items are present:

Name Type Status
Falcon Content Filter 🟢 Enabled
Reveal Agent Network Configuration Profile Content Filter 🟢 Enabled
Microsoft Defender Content Filter Content Filter 🟡 Enabled
GlobalProtectEn Content Filter 🟡 Enabled
Cisco Anyconnect Socket Filter Content Filter 🔴 Disabled
GlobalProtectDn DNS Proxy 🔴 Disabled
Cisco Anyconnect Socket Filter DNS Proxy 🟢 Enabled
GlobalProtectDo Transparent Proxy 🔴 Disabled
Cisco Anyconnect Socket Filter Transparent Proxy 🟢 Enabled

Disabling the 2 enabled "Cisco Anyconnect Socket Filter" items does not change the behavior observed. The other enabled items are greyed out and can't be disabled.

Entirely possible this is some weird bug in one of these (though macos dns working seems to indicate some interaction of issues)

Rancher Desktop Version

1.12.2

Rancher Desktop K8s Version

1.28.5

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

14.2.1 (23C71)

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

codyps avatar Jan 26 '24 17:01 codyps

This appears to no longer be reproducible for me, possibly due to system changes pushed by corporate (iow: DNS now resolves properly inside docker containers running on rancher desktop).

New state of "VPNs & Filters": image

A change to enable Microsoft Defender more fully seems to have gone out.

Feel free to close for now if this is not reproducible for others (I expect there's some funky configuration/software that is causing it). I'll report back if it re-occurs.

Let me know if there's any additional info I can capture when the issue occurs (or while it isn't occuring) that would be useful for debugging it.

codyps avatar Jan 29 '24 18:01 codyps

I'm experiencing the same problem on M1 Macbook Air, macOS Sonoma 14.5. Rancher Desktop version 1.13.1

how can I troubleshoot this?

btw, after running rdctl shell, there's no dig command in the VM. there's nslookup though.

sloppycoder avatar Jun 09 '24 14:06 sloppycoder