netbird
netbird copied to clipboard
Nameserver being randomly unavailable
Describe the problem
We're using the self-hosted version of Netbird and everything is setup according to the documentation. Sometimes the custom nameserver is resolved, sometimes it isn't. That's without ever touching the config on the web interface.
To give more context here is our network configuration:
- We're hosting multiple internal services like Gitlab for example
- We can access those services using pfSense's DNS resolver
- Every user in Netbird has a Network Route to the pfSense's IP (10.10.10.1)
- Every user in Netbird has a Network Route to the different internal services' IP like the one hosting Gitlab
- We have a Nameserver that matches the domain of those services (like gitlab.mycompany.com) using the pfSense's IP
When a user is connected to the Netbird VPN, he can ping every server and every user without any problem. For example, users can ping Gitlab's Netbird IP:
> ping 100.73.149.194
PING 100.73.149.194 (100.73.149.194): 56 data bytes
64 bytes from 100.73.149.194: icmp_seq=0 ttl=64 time=35.938 ms
64 bytes from 100.73.149.194: icmp_seq=1 ttl=64 time=32.203 ms
64 bytes from 100.73.149.194: icmp_seq=2 ttl=64 time=32.427 ms
But users cannot ping pfSense's DNS Resolver IP:
> ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
The netbird status -d command returns this problem:
[10.10.10.1:53] for [gitlab.mycompany.com] is Unavailable, reason: 1 error occurred:
* read udp 192.168.1.182:53408->10.10.10.1:53: i/o timeout
Apart from this, we have no logs server side about this, and only i/o timeout for the pfSense's DNS Resolver IP in var/log/netbird/client.log.
But sometimes, without changing anything, either client or server side, everything works just fine.
This issue appears on every OS: Windows 11, macOS 14.4.1 (23E224) and Ubuntu 22.04.
To Reproduce
Since the problem is random, we have no clue how to reproduce this problem.
Expected behavior
The Nameserver is supposed to be constantly recognize by Netbird without being randomly unavailable.
Are you using NetBird Cloud?
We're using Netbird self-hosted solution.
NetBird version
Every user is up to date: 0.27.3.
Hi @Enailis,
how exactly is the route to 10.10.10.1 set up? Are you sure the configured routing peer is online and successfully connected to the users peer that tries to ping? Is that connection direct or relayed? So with netbird status -d can you detect a difference in the connection when it is working compared to when it is not working?
Hi @pascal-fischer,
We have a network route to 10.10.10.1/32 using our internal servers as peer group. All servers in this group have access to 10.10.10.1. This route is distributed to all users. We have 3 different peers in this group, they're all online. The servers can't ping the users with their Netbird's IP. The users can ping the servers using their real IP but not their Netbird's IP. The connection to the 3 servers in the peer group is relayed.
We actually can't detect any difference in the netbird status -d when it's working and when it's not. The current configuration gives me this result for netbird status -d, there is other peers but they all look the same as the one shown here:
server.mycompany.com:
NetBird IP: 100.73.252.226
Public key: RupexIsExt4J2oKsN4avstKkjD03vlSq728BzT/uvB8=
Status: Connected
-- detail --
Connection type: Relayed
Direct: false
ICE candidate (Local/Remote): relay/prflx
ICE candidate endpoints (Local/Remote): 90.90.90.90:63293/80.80.80.80:63293
Last connection update: 2024-04-22 13:58:48
Last WireGuard handshake: 2024-04-22 14:11:34
Transfer status (received/sent) 1.9 KiB/1.5 KiB
Quantum resistance: false
Routes: -
Latency: 55.684815ms
Daemon version: 0.27.3
CLI version: 0.27.3
Management: Connected to https://vpn.mycompany.com:33073
Signal: Connected to http://vpn.mycompany.com:10000
Relays:
[stun:vpn.mycompany.com:3478] is Available
[turn:vpn.mycompany.com:3478?transport=udp] is Available
Nameservers:
[10.10.10.1:53] for [gitlab.mycompany.com] is Available
FQDN: ena.mycompany.com
NetBird IP: 100.73.213.219/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 8/13 Connected
Even if everything looks fine, I cannot access gitlab.mycompany.com.
To add something from my original issue, it now works for some windows users. The client takes a long time to connect and sometimes users have to do netbird up/netbird down multiple times before it actually works. It still doesn't work for Linux, macOS and some windows users.
Hello, I'm working on the same Netbird instance as @Enailis
Some corrections / additional information on the above post:
- the domain name of the peer 100.73.252.226 is server.mycompany.vpn and not server.mycompany.com
- gitlab.mycompany.com is hosted on 100.73.252.226
Here is some other additional information:
On Windows clients (our users connected with SSO), our nameserver (10.10.10.1:53) is unstable, and its availability can change from one netbird down & netbird up to another for no apparent reason.
When it's available, we can access gitlab.mycompany.com and server.mycompany.vpn without any problem.
However, when it's unavailable ([10.10.10.1:53] for [gitlab.mycompany.com] is Unavailable, reason: 1 error occurred: * read udp 192.168.1.182:53408->10.10.10.1:53: i/o timeout)), we can no longer access gitlab.mycompany.com but we can still access server.mycompany.vpn.
On our Linux clients (other users connected with SSO), other behaviors appear.
Our nameserver (10.10.10.1:53) is always marked as available in a netbird status -d, however, it is impossible to access gitlab.mycompany.com or server.mycompany.vpn
Here is a client's /etc/resolv.conf file:
# Generated by NetworkManager
nameserver 192.168.1.1
If I run dig gitlab.mycompany.com, I don't get an IP address back. However, if I run dig @10.10.10.1 gitlab.mycompany.com, its IP appears. So by adding the line nameserver 10.10.10.1 in the clients' /etc/resolv.conf files, we can access our gitlab but we can't still access server.mycompany.vpn.
Note that we can still access our gitlab via its IP address (the IP given by Netbird and its real IP). Our routes are therefore well configured, the problem only comes from DNS resolution.
Finally, note that this problem never appears for Linux clients installed with a Setup Key (our servers). Here's their /etc/resolv.conf file:
...
nameserver 127.0.0.53
options edns0 trust-ad
search company.vpn company.com
We therefore believe that the problem only comes from Netbird clients, which cannot apply DNS configurations to our workstations (Linux and Windows).
Hello, here is some additional information about our Windows client errors.
Here are the lines in the client.log file when the error [10.10.10.1:53] for [gitlab.mycompany.com] is Unavailable, reason: 1 error occurred: * read udp 192.168.1.182:53408->10.10.10.1 :53: i/o timeout) appears on our Windows clients:
2024-04-25T11:58:42+02:00 ERRO util/net/dialer_generic.go:64: Failed to call dialer hooks: 1 error occurred:
* executing dial hook: 1 error occurred:
* adding route reference: failed to add route for prefix 90.90.90.90/32: add route to table: PowerShell add route: exit status 1
2024-04-25T11:58:43+02:00 ERRO util/net/listener_generic.go:128: Error executing listener write hook: adding route reference: failed to add route for prefix 90.90.90.90/32: add route to table: PowerShell add route: exit status 1
Similar issue here on macOS.
- 192.168.99.1 is the IP of the DNS server
- 192.168.99.1 is reachable via ICMP
- nslookup docker.my-localdomain.local 192.168.99.1 also works
Server: 192.168.99.1
Address: 192.168.99.1#53
Non-authoritative answer:
Name: docker.my-localdomain.local
Address: 192.168.99.125
OS: darwin/arm64
Daemon version: 0.27.10
CLI version: 0.27.10
Management: Connected to https://netbird.mydomain.com:33073
Signal: Connected to http://netbird.mydomain.com:10000
Relays:
[stun:netbird.mydomain.com:3478] is Available
[turn:netbird.mydomain.com:3478?transport=udp] is Available
Nameservers:
[192.168.99.1:53] for [my-localdomain.local, mydomain.com] is Unavailable, reason: 1 error occurred:
* read udp 100.102.88.179:65220->192.168.99.1:53: i/o timeout
FQDN: nbfombprom1max.ivo
NetBird IP: 100.102.88.179/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 6/12 Connected