netbird
netbird copied to clipboard
DNS settings break DNS resolution in k3s/rke2 clusters.
Describe the problem
We are running netbird on Ubuntu 24.04 host systems (hetzner cloud + dedicated) which also run k3s / rke2.
Up until this Tuesday, this was working flawlessly (even though we have not installed any updates of netbird/ubuntu/k3s/rke2 within the days of breaking).
Since then, pods have been behaving erratic and failed installing alpine linux packages. We have now pinned the issue down to having DNS settings active in netbird. Once the machine is excluded using "Disable DNS management", alpine linux packages can be installed.
To Reproduce
Steps to reproduce the behavior:
- Setup a new server (e.g. on Hetzner Cloud)
- Install netbird:
curl -fsSL https://pkgs.netbird.io/install.sh | sh
netbird up
- Install k3s:
curl -sfL https://get.k3s.io | sh -
- Launch a bash session for alpine linux:
k3s kubectl run test-shell --rm -i --tty --image alpine:latest
Within that shell try to ping dl-cdn.alpinelinux.org:
ping dl-cdn.alpinelinux.org
ping: bad address 'dl-cdn.alpinelinux.org'
If either netbird service is stopped or the machine is excluded from DNS settings, pinging starts working again once the pod is restarted. Interestingly pinging ping dl-cdn.alpinelinux.org. (with dot suffix) works.
Expected behavior
Pinging to dl-cdn.alpinelinux.org succeeds.
Are you using NetBird Cloud?
Cloud
NetBird version
Tried multiple:
- 0.46.0
- 0.45.3
- 0.44.0
Is any other VPN software installed?
No
Debug output
To help us resolve the problem, please attach the following anonymized status output
Peers detail:
gitlab.netbird.cloud:
NetBird IP: 100.92.104.216
Public key: 5JcftOw3hmB/rW7xULTpX49hbEGi/SSYP6bWNxMiJ0k=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/host
ICE candidate endpoints (Local/Remote): 10.42.0.0:51820/198.51.100.0:51820
Relay server address: rels://streamline-de-fra1-0.relay.netbird.io:443
Last connection update: 19 minutes, 58 seconds ago
Last WireGuard handshake: 57 seconds ago
Transfer status (received/sent) 1.5 KiB/4.4 KiB
Quantum resistance: false
Networks: -
Latency: 3.360698ms
Events:
[WARNING] DNS (eb819e1f-0e71-4bfa-aecf-6c98b0980181)
Message: All upstream servers failed (probe failed)
Time: 20 minutes, 2 seconds ago
Metadata: upstreams: 100.92.104.216:5353
[WARNING] DNS (ce264190-443d-4984-bea2-96d5d5f05748)
Message: All upstream servers failed (probe failed)
Time: 20 minutes, 2 seconds ago
Metadata: upstreams: 100.92.104.216:5353
[INFO] SYSTEM (0fed29a3-2ca0-4044-86c4-7d0ac9b2d606)
Message: Network map updated
Time: 20 minutes, 2 seconds ago
OS: linux/amd64
Daemon version: 0.44.0
CLI version: 0.44.0
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
[stun:stun.netbird.io:5555] is Available
[turns:turn.netbird.io:443?transport=tcp] is Available
[rels://streamline-de-fra1-0.relay.netbird.io:443] is Available
Nameservers:
[100.92.104.216:5353] for [gitlab.anon-ZDWPs.domain, gitlab-ssh.anon-ZDWPs.domain] is Available
FQDN: ubuntu-16gb-fsn1-1.netbird.cloud
NetBird IP: 100.92.239.76/16
Interface type: Kernel
Quantum resistance: false
Networks: -
Forwarding rules: 0
Peers count: 1/1 Connected
Create and upload a debug bundle, and share the returned file key:
f79e391890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/34c03ba3-60c9-44dc-8aec-f6f42fd25f8e
Screenshots
Additional context
Add any other context about the problem here:
Running tcpdump port 53 on the host machine reveals:
12:57:02.174698 IP static.179.xx.xxx.xxx.clients.your-server.de.datametrics > ns2.recursivedns.hetzner.com.domain: 57579+ AAAA? dl-cdn.alpinelinux.org.netbird.cloud. (54)
12:57:02.174733 IP static.179.xx.xxx.xxx.clients.your-server.de.12678 > ns2.recursivedns.hetzner.com.domain: 1475+ A? dl-cdn.alpinelinux.org.netbird.cloud. (54)
12:57:02.244589 IP static.179.xx.xxx.xxx.clients.your-server.de.59458 > ns1.recursivedns.hetzner.com.domain: 20223+ [1au] PTR? 2.64.12.185.in-addr.arpa. (53)
12:57:02.245085 IP ns1.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.59458: 20223 1/3/1 PTR ns2.recursivedns.hetzner.com. (182)
12:57:02.335256 IP ns2.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.12678: 1475 ServFail 0/0/0 (54)
12:57:02.335923 IP ns2.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.datametrics: 57579 ServFail 0/0/0 (54)
It looks like it's trying to lookup dl-cdn.alpinelinux.org.netbird.cloud., so it seems like it's adding the search domain to every lookup 🤷
Have you tried these troubleshooting steps?
- [x] Reviewed client troubleshooting (if applicable)
- [x] Checked for newer NetBird versions
- [x] Searched for similar issues on GitHub (including closed ones)
- [x] Restarted the NetBird client
- ~~[ ] Disabled other VPN software~~
- [x] Checked firewall settings