netbird icon indicating copy to clipboard operation
netbird copied to clipboard

DNS settings break DNS resolution in k3s/rke2 clusters.

Open theoriginalgri opened this issue 4 months ago • 9 comments

Describe the problem

We are running netbird on Ubuntu 24.04 host systems (hetzner cloud + dedicated) which also run k3s / rke2.

Up until this Tuesday, this was working flawlessly (even though we have not installed any updates of netbird/ubuntu/k3s/rke2 within the days of breaking).

Since then, pods have been behaving erratic and failed installing alpine linux packages. We have now pinned the issue down to having DNS settings active in netbird. Once the machine is excluded using "Disable DNS management", alpine linux packages can be installed.

To Reproduce

Steps to reproduce the behavior:

  1. Setup a new server (e.g. on Hetzner Cloud)
  2. Install netbird:
curl -fsSL https://pkgs.netbird.io/install.sh | sh
netbird up
  1. Install k3s:
curl -sfL https://get.k3s.io | sh -
  1. Launch a bash session for alpine linux:
k3s kubectl run test-shell --rm -i --tty --image alpine:latest

Within that shell try to ping dl-cdn.alpinelinux.org:

ping dl-cdn.alpinelinux.org
ping: bad address 'dl-cdn.alpinelinux.org'

If either netbird service is stopped or the machine is excluded from DNS settings, pinging starts working again once the pod is restarted. Interestingly pinging ping dl-cdn.alpinelinux.org. (with dot suffix) works.

Expected behavior

Pinging to dl-cdn.alpinelinux.org succeeds.

Are you using NetBird Cloud?

Cloud

NetBird version

Tried multiple:

  • 0.46.0
  • 0.45.3
  • 0.44.0

Is any other VPN software installed?

No

Debug output

To help us resolve the problem, please attach the following anonymized status output

Peers detail:
 gitlab.netbird.cloud:
  NetBird IP: 100.92.104.216
  Public key: 5JcftOw3hmB/rW7xULTpX49hbEGi/SSYP6bWNxMiJ0k=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/host
  ICE candidate endpoints (Local/Remote): 10.42.0.0:51820/198.51.100.0:51820
  Relay server address: rels://streamline-de-fra1-0.relay.netbird.io:443
  Last connection update: 19 minutes, 58 seconds ago
  Last WireGuard handshake: 57 seconds ago
  Transfer status (received/sent) 1.5 KiB/4.4 KiB
  Quantum resistance: false
  Networks: -
  Latency: 3.360698ms

Events:
  [WARNING] DNS (eb819e1f-0e71-4bfa-aecf-6c98b0980181)
    Message: All upstream servers failed (probe failed)
    Time: 20 minutes, 2 seconds ago
    Metadata: upstreams: 100.92.104.216:5353
  [WARNING] DNS (ce264190-443d-4984-bea2-96d5d5f05748)
    Message: All upstream servers failed (probe failed)
    Time: 20 minutes, 2 seconds ago
    Metadata: upstreams: 100.92.104.216:5353
  [INFO] SYSTEM (0fed29a3-2ca0-4044-86c4-7d0ac9b2d606)
    Message: Network map updated
    Time: 20 minutes, 2 seconds ago
OS: linux/amd64
Daemon version: 0.44.0
CLI version: 0.44.0
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
  [stun:stun.netbird.io:5555] is Available
  [turns:turn.netbird.io:443?transport=tcp] is Available
  [rels://streamline-de-fra1-0.relay.netbird.io:443] is Available
Nameservers:
  [100.92.104.216:5353] for [gitlab.anon-ZDWPs.domain, gitlab-ssh.anon-ZDWPs.domain] is Available
FQDN: ubuntu-16gb-fsn1-1.netbird.cloud
NetBird IP: 100.92.239.76/16
Interface type: Kernel
Quantum resistance: false
Networks: -
Forwarding rules: 0
Peers count: 1/1 Connected

Create and upload a debug bundle, and share the returned file key:

f79e391890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/34c03ba3-60c9-44dc-8aec-f6f42fd25f8e

Screenshots

Additional context

Add any other context about the problem here:

Running tcpdump port 53 on the host machine reveals:

12:57:02.174698 IP static.179.xx.xxx.xxx.clients.your-server.de.datametrics > ns2.recursivedns.hetzner.com.domain: 57579+ AAAA? dl-cdn.alpinelinux.org.netbird.cloud. (54)
12:57:02.174733 IP static.179.xx.xxx.xxx.clients.your-server.de.12678 > ns2.recursivedns.hetzner.com.domain: 1475+ A? dl-cdn.alpinelinux.org.netbird.cloud. (54)
12:57:02.244589 IP static.179.xx.xxx.xxx.clients.your-server.de.59458 > ns1.recursivedns.hetzner.com.domain: 20223+ [1au] PTR? 2.64.12.185.in-addr.arpa. (53)
12:57:02.245085 IP ns1.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.59458: 20223 1/3/1 PTR ns2.recursivedns.hetzner.com. (182)
12:57:02.335256 IP ns2.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.12678: 1475 ServFail 0/0/0 (54)
12:57:02.335923 IP ns2.recursivedns.hetzner.com.domain > static.179.xx.xxx.xxx.clients.your-server.de.datametrics: 57579 ServFail 0/0/0 (54)

It looks like it's trying to lookup dl-cdn.alpinelinux.org.netbird.cloud., so it seems like it's adding the search domain to every lookup 🤷

Have you tried these troubleshooting steps?

  • [x] Reviewed client troubleshooting (if applicable)
  • [x] Checked for newer NetBird versions
  • [x] Searched for similar issues on GitHub (including closed ones)
  • [x] Restarted the NetBird client
  • ~~[ ] Disabled other VPN software~~
  • [x] Checked firewall settings

theoriginalgri avatar Jun 11 '25 13:06 theoriginalgri