lima
lima copied to clipboard
LIma VM randomly and sporadically loses network connectivity
Summary
All networking and DNS functionality stops working on the Lima VM after a random number of bytes are transferred. Networking gets restored approx. ten seconds later.
Reboots fix this problem temporarily.
Host configuration
2021 Apple MacBook Pro M1 Pro, 16GB RAM
VM configuration
See here.
How To Reproduce
- Start the VM:
limactl start. The configuration above also installs Docker withdockerd. - Set
DOCKER_HOSTto the IP of the VM on the host - Perform random network traffic through a Docker container until you receive an i/o timeout
Troubleshooting Performed
- Changed the version of Alpine used on the guest to 3.14.10
sudo ifdown eth0 && sudo ifup eth0on the host
UPDATE: I've found that disabling host-level DNS resolution works around this issue. I'm leaving this open, however, as some users might not be able to do this:
useHostResolver: false
# You don't have to use the Google DNS servers.
dns:
- 8.8.8.8
- 4.4.4.4
Also happening to me. I can confirm setting useHostResolver: false fixed the issue for me as well. Same configuration, 2021 M1 Mac Pro, 16GB.
I had this same issue with an Intel mac and hoseResolver.enabled: false and adding a dns resolved it
I'm wondering if the "timeout" was always in response to a DNS query.
We have been chasing a DNS issue that only manifests on M1 machines. The problem turned out to be that the M1 is too fast for the DNS resolver implementation in glibc, so multiple requests end up with identical transaction ids, and the resolver then cannot process the replies correctly.
This has been fixed (well, worked around) in #738, so I wonder if you could test this again against the latest code from master (or against Lima 0.10, once it is released).
See e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1868106 for another manifestation of the same issue (DNS suddenly slowing down/timing out because of transaction id collisions) on s390 architecture.
Please close this issue if you can confirm that hostResolver has been fixed for you!
Unfortunately, I'm still encountering very high DNS latency in my simple test. Here's a comparison between two image pulls of the selenium/standalone-chrome-debug Docker image:
with useHostResolver: false
$: time docker pull selenium/standalone-chrome-debug:latest
latest: Pulling from selenium/standalone-chrome-debug
da7391352a9b: Pull complete
14428a6d4bcd: Pull complete
2c2d948710f2: Pull complete
ec2bb7b8cfcf: Pull complete
72c74524b7d9: Pull complete
e95b037711cc: Pull complete
9e29948e1cc8: Pull complete
d78b0ec6f93b: Pull complete
9b17ca1ca0e1: Pull complete
b90dd6cf4c3a: Pull complete
2c2cf0378d87: Pull complete
619d6d13928d: Pull complete
d6980aa31e6d: Pull complete
4afbf4c86318: Pull complete
09d5141a5571: Pull complete
7678572f4c71: Pull complete
58292a24ab09: Pull complete
43fe2b7c4466: Pull complete
ccc0ee5ae5ae: Pull complete
17b1b93543ce: Pull complete
df9057a165f3: Pull complete
279323f9c697: Pull complete
2547270e0b8a: Pull complete
dd473c9242a0: Pull complete
d71a31c83cd9: Pull complete
7a7a638c1443: Pull complete
b6d9919dcbc1: Pull complete
847edbc2c92a: Pull complete
3932171e9225: Pull complete
701dff852387: Pull complete
Digest: sha256:0c59037d0a095d7edb7b956e95a24573a6a441654a1acd2f7bebad048ef16e65
Status: Downloaded newer image for selenium/standalone-chrome-debug:latest
docker.io/selenium/standalone-chrome-debug:latest
real 0m16.905s
user 0m0.039s
sys 0m0.025s
with useHostResolver: true
$: time docker pull selenium/standalone-chrome-debug:latest
latest: Pulling from selenium/standalone-chrome-debug
da7391352a9b: Pull complete
14428a6d4bcd: Pull complete
2c2d948710f2: Pull complete
ec2bb7b8cfcf: Pull complete
72c74524b7d9: Pull complete
e95b037711cc: Pull complete
9e29948e1cc8: Pull complete
d78b0ec6f93b: Pull complete
9b17ca1ca0e1: Pull complete
b90dd6cf4c3a: Pull complete
2c2cf0378d87: Pull complete
619d6d13928d: Pull complete
d6980aa31e6d: Pull complete
4afbf4c86318: Pull complete
09d5141a5571: Pull complete
7678572f4c71: Pull complete
58292a24ab09: Pull complete
43fe2b7c4466: Pull complete
ccc0ee5ae5ae: Pull complete
17b1b93543ce: Pull complete
df9057a165f3: Pull complete
279323f9c697: Downloading
2547270e0b8a: Download complete
dd473c9242a0: Download complete
d71a31c83cd9: Downloading
7a7a638c1443: Downloading
b6d9919dcbc1: Download complete
847edbc2c92a: Download complete
3932171e9225: Download complete
701dff852387: Download complete
dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:58332->192.168.5.3:53: i/o timeout
real 3m30.983s
user 0m0.059s
sys 0m0.040s
@carlosonunez Can you confirm that you are using lima 0.9.2, which is supposed to have a fix for this in the hostResolver?
I'm seeing the same issue on lima 0.11.0. DNS appears very flaky in the VM. Fixed by this config.
useHostResolver: false
dns:
- 8.8.8.8
- 4.4.4.4
This is the broken config:
hostResolver:
# hostResolver.hosts requires lima 0.8.3 or later. Names defined here will also
# resolve inside containers, and not just inside the VM itself.
hosts:
host.docker.internal: host.lima.internal
QEMU 7.0.0, macOS 12.4, 2021 M1 Max.