lima
lima copied to clipboard
Dropped DNS/UDP requests on user-v2 network
Description
I'm experiencing intermittent DNS resolution failure. The issue becomes more prominent after the VM runs for a while (or after running network-heavy related workloads).
I tracked down the root cause and it seems that it caused by a bug in gvisor-tap-vsock (PR)
Opening issue here so we can use the fixed gvisor-tap-vsock after the bug fix is released.
Setup
limactl version 0.23.2 colima version 0.7.3
lima.yaml (created by colima start --vm-type vz)
vmType: vz
arch: aarch64
images:
- location: /path/to/image.raw
arch: aarch64
cpus: 4
memory: 8GiB
disk: 60GiB
mounts:
- location: "~"
writable: true
- location: /tmp/colima
writable: true
mountType: virtiofs
ssh:
loadDotSSHPubKeys: false
forwardAgent: false
containerd:
system: false
user: false
dns: []
firmware:
legacyBIOS: false
hostResolver:
enabled: true
hosts:
host.docker.internal: host.lima.internal
portForwards:
- guestPortRange:
- 0
- 0
guestSocket: /var/run/docker.sock
hostPortRange:
- 0
- 0
hostSocket: /Users/fata.nugraha/.colima/default/docker.sock
proto: tcp
- guestPortRange:
- 0
- 0
guestSocket: /var/run/docker.sock
hostPortRange:
- 0
- 0
hostSocket: /Users/fata.nugraha/.colima/docker.sock
proto: tcp
- guestIPMustBeZero: true
guestIP: 0.0.0.0
guestPortRange:
- 1
- 65535
hostIP: 0.0.0.0
hostPortRange:
- 1
- 65535
proto: tcp
- guestIP: 127.0.0.1
guestPortRange:
- 1
- 65535
hostIP: 127.0.0.1
hostPortRange:
- 1
- 65535
proto: tcp
networks:
- lima: user-v2
provision:
- mode: system
script: sysctl -w fs.inotify.max_user_watches=1048576
- mode: dependency
script: groupadd -f docker && usermod -aG docker {{ .User }}
- mode: system
script: hostnamectl set-hostname colima
- mode: system
script: mount -a
- mode: system
script: readlink /usr/sbin/fstrim || fstrim -a
Reproduction steps
The issue will appear when you're reusing the same source ip:addr after 90s of inactivity in between.
Run the code below inside the vm.
package main
import (
"context"
"fmt"
"net"
"time"
)
func main() {
r := &net.Resolver{
PreferGo: true,
Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
addr, err := net.ResolveUDPAddr("udp", "0.0.0.0:50406")
if err != nil {
panic(err)
}
d := net.Dialer{
Timeout: time.Millisecond * time.Duration(10000),
KeepAlive: -1,
LocalAddr: addr,
}
conn, err := d.DialContext(ctx, network, "8.8.8.8:53")
if err != nil {
panic(err)
}
fmt.Println("LocalAddr: ", conn.LocalAddr())
return conn, err
},
}
lookup := func() {
fmt.Printf("%s starting LookupIP\n", time.Now())
_, err := r.LookupIP(context.Background(), "ip4", "www.google.com")
if err != nil {
fmt.Println("err", err)
} else {
fmt.Println("ok")
}
}
lookup() // ok
time.Sleep(95 * time.Second) // wait for the UDPConnTimeout
lookup() // this will fail after 2 retries
}