Broken DNS resolution of *.our.domain on Windows client
Describe the problem
The resolution of all subdomains under *our.domain does not work for certain application (for example any web browser or ping tool, however nslookup resolves IP correctly). This used to happen in the past when netbird was shut down incorrectly, as discussed on Slack. Now it seems to happen the same way - hard laptop shutdown, system boots up, dns not resolved.
This results that the clients cannot connect.
To Reproduce
Steps to reproduce the behavior: TBD
Expected behavior
All DNS records should be resolved correctly.
Are you using NetBird Cloud?
Self-hosted (v0.31.1 incl. relay as well as coturn)
NetBird version
0.31.1
NetBird status -dA output:
X
Do you face any (non-mobile) client issues?
2024-11-15T08:27:16+01:00 ERRO util/grpc/dialer.go:38: Failed to dial: dial: dial tcp: lookup netbird.our.domain: no such host
Screenshots
X
Additional context
The easiest to fix it is to connecto to the Netbird Cloud instance, which somehows resets the windows DNS configutation so the *.our.domain is immediately resolved correctly.
Output of Resolve-DnsName -Name www.our.domain
Resolve-DnsName: www.unipi.technology : Daná operace se vrátila, protože vypršel časový limit. //Time exceeded
Output of: ping www.our.domain
Ping request could not find host www.our.domain. Please check the name and try again.
Output of: nslookup www.our.domain
Server: dns.google
Address: 8.8.8.8
Non-authoritative answer:
Name: our.domain
Address: correct IP address
Aliases: www.our.domain
Output of Get-DnsClientNrptPolicy
Namespace : .ourdomain.local
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 10.220.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
Namespace : .our.domain
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 10.220.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
try deleting this registry-key, when this happens. Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Dnscache\Parameters\DnsPolicyConfig\NetBird-Match
can you also confirm that the domain used for your netbird controller is also supposed to be routed to through the wireguard tunnel after the connection is established?
@roberthase thanks, this did the trick, is it possible to be done by Netbird service, as this requires elevated permissions? It behaved rather weirdly, before I did your trick, but I am not an expert on how networking works in Windows. nslookup returned a proper result from the main DNS resolver, but ping or traceroute failed.
Not sure if I get your question, but the management and admin domains are a subdomains (management.example.com) of the domain that was stuck in the registry (.example.com)
The registry key i posted gets created when netbird successfully connects.
All matched domains you configured in your controller under DNS -> Nameservers are listed here.
With matched domains configured, every domain you entered can only be accessible over the wireguard tunnel/interface.
When netbird is gracefully shutdown/disconnected, the registry gets deleted.
There can be instances, where your windows os could not shutdown correctly and thats where things get ugly, if the domain of your controller is also a matched domain.
You boot your system and the registry key is still there and now your netbird-client can't reach your controller netbird.example.com, because example.com is supposed to go through your wireguard tunnel/interface.
Nslookup should give you the right result, because its using the dns server configured on your pc or your router, but the routing is wrong.
Thank you for the explanation. I have noticed that Netbird removes those entries on graceful deactivation. I thought if Netbird could try to delete this entry also on its start (pre-start clean up, before it starts actually doing something).
Anyways, I will propose to move the management out of the domains that go via wireguard. Unfortunately this will mean a lot of changes (management URL will stay the same, but domains and services will be buried one level lower under one more subdomain).
It's still happening randomly also with client 0.35.2
we route only relevant subdomains now, so netbird.example.com is not affected. maybe this is also possible for your enviroment.
It's still happening randomly also with client 0.35.2
We are experiencing the same issue with latest client version. Could this be corrected so that netbird client takes care of cleanup each time host is started or as @cleveHEX has suggested on client start?
@jakovnikolic both of these are implemented. Can you share more about the issue you have?
This morning i have got support request from 2 of my colleagues telling me they are not able to connect or access any of our internal domains. They have been using Windows operating system and they have just updated clients to latest version v0.35.2.
After manual removal of HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Dnscache\Parameters\DnsPolicyConfig\NetBird-Match they have managed to connect and access everything as expected.
We have tested running nslookup and domain of our VPN server resolves without any issues but using ping we would get no response. Since nslookup opens a winsock connection on the DNS port and issues a query, whereas ping uses the DNS Client service.
This morning i have got support request from 2 of my colleagues telling me they are not able to connect or access any of our internal domains. They have been using Windows operating system and they have just updated clients to latest version v0.35.2.
After manual removal of
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Dnscache\Parameters\DnsPolicyConfig\NetBird-Matchthey have managed to connect and access everything as expected.We have tested running
nslookupand domain of our VPN server resolves without any issues but usingpingwe would get no response. Since nslookup opens a winsock connection on the DNS port and issues a query, whereas ping uses the DNS Client service.
Exactly the same for us
Can you share the contents of %PROGRAMDATA\Netbird\state.json the next time this happens before starting netbird?
Does this happen on boot or when waking up from sleep? If after sleep, does rebooting fix the issue (netbird cleans up on service start)?
Does netbird state clean dns_state fix it?
Does netbird service restart fix it?
Also logs would be helpful, e.g. with netbird debug bundle -A once netbird is running again
@roberthase I am seeing same issue on macOS, what is the alternate of registry clearing on MacOS ?
sorry. i do not know where these settings are stored on MacOS.
To follow up with a new issues we experienced after upgrading clients, routingpeers and the controller from 0.31.0 0.35.2.
On some Windows 11 Clients the registry key for the routes is created after the connection is established and then is immediately deleted afterwards.
So these clients can connect to the controller but not the routes.
Downgrading or uninstalling/reinstalling the client to 0.31.0 has no effect. The new issues persists.
Any advice on how to remedy this issue?
For anyone's future reference, this is how I resolved temporarily on MacOS:
- Open the file
sudo vim /etc/hosts. - Manually find the IP address of your company domain which is not getting resolved.
- At the end of above file, add an entry corresponding to your data so that DNS resolution can happen. eg.
99.22.11.33 foo.mydomain.com
Once this was done, netbird started working properly. After this I removed the entry from hosts file and netbird continues to work fine. Adding this entry anywhere else in dns resolution setting was not working. I believe hosts file takes preference over everything else and hence it worked.
Can you share the contents of
%PROGRAMDATA\Netbird\state.jsonthe next time this happens before starting netbird?
The file did not exist
Does this happen on boot or when waking up from sleep? If after sleep, does rebooting fix the issue (netbird cleans up on service start)?
The person did not know how this happened.
Does
netbird state clean dns_statefix it? Doesnetbird service restartfix it?
None of these fixed the issue, only the manual registry edit.
Also logs would be helpful, e.g. with
netbird debug bundle -Aonce netbird is running again
@cleveHEX thank you.
The state file seems to be corrupted, hence the cleanup fails:
2025-01-21T11:23:10+01:00 WARN client/internal/statemanager/manager.go:307: State file appears to be corrupted, attempting to delete itinvalid character '\x00' looking for beginning of value 2025-01-21T11:23:10+01:00 INFO client/internal/statemanager/manager.go:311: State file deleted 2025-01-21T11:23:10+01:00 WARN client/server/server.go:109: failed to restore residual state: 1 error occurred: * perform cleanup: load state file: unmarshal states: invalid character '\x00' looking for beginning of value
I'll see if I can reproduce the issue
Happened to me today when I got to the computer and I found out that my PC made a BSOD over night with no dump available.
@lixmal today again also after updating Windows
netbird.debug.1700204602.zip - before fix by removing registry netbird.debug.1700204602 1.zip - after fix and reconnect
UPDATE:
After a reboot, things seem to be back to normal.
#==============================
I've the same problem with 0.37.2 as describe in #3468 . After update to 0.38.0 today, I've got this issue again. But what weird is that I can't find Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Dnscache\Parameters\DnsPolicyConfig\NetBird-Match key in regeditor. And GPO's DNS policy doesn't have any relative record either. Only Get-DnsClientNrptPolicy show the policy.
state.json netbird.debug.820797045.zip
powershell output
PS C:\Users\kortan> Get-DnsClientGlobalSetting
UseSuffixSearchList : False
SuffixSearchList : {}
UseDevolution : True
DevolutionLevel : 0
PS C:\Users\kortan> Get-DnsClientNrptGlobal
EnableDAForAllNetworks QueryPolicy SecureNameQueryFallback
---------------------- ----------- -----------------------
Disable Disable Disable
PS C:\Users\kortan> Get-DnsClientNrptRule
Name : {C43E1699-C69D-4DFB-9737-AF855565626D}
Version : 1
Namespace : {.85.100.in-addr.arpa, .86.100.in-addr.arpa, .87.100.in-addr.arpa, .88.100.in-addr.a
rpa...}
IPsecCARestriction :
DirectAccessDnsServers :
DirectAccessEnabled : False
DirectAccessProxyType :
DirectAccessProxyName :
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired :
NameServers : 100.100.100.100
DnsSecEnabled : False
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired :
DnsSecValidationRequired :
NameEncoding : Disable
DisplayName :
Comment :
PS C:\Users\kortan> Get-DnsClientNrptPolicy
Namespace : .65.100.in-addr.arpa
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 100.65.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
Namespace : .netbird.some.domain
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 100.65.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
Namespace : .some.domain
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 100.65.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
Namespace : .relay.some.domain
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 100.65.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
Namespace : .management.some.domain
QueryPolicy :
SecureNameQueryFallback :
DirectAccessIPsecCARestriction :
DirectAccessProxyName :
DirectAccessDnsServers :
DirectAccessEnabled :
DirectAccessProxyType : NoProxy
DirectAccessQueryIPsecEncryption :
DirectAccessQueryIPsecRequired : False
NameServers : 100.65.255.254
DnsSecIPsecCARestriction :
DnsSecQueryIPsecEncryption :
DnsSecQueryIPsecRequired : False
DnsSecValidationRequired : False
NameEncoding : Utf8WithoutMapping
PS C:\Users\kortan> Get-ChildItem -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig"
Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig
Name Property
---- --------
{C43E1699-C69D-4DFB-9737-AF855 Version : 1
565626D} Name : {.85.100.in-addr.arpa, .86.100.in-addr.arpa, .87.100.in-addr.arpa, .
88.100.in-addr.arpa...}
GenericDNSServers : 100.100.100.100
ConfigOptions : 8
PS C:\Users\kortan> Get-ChildItem -Path "HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig"
PS C:\Users\kortan>
@lixmal Did not happen for some time on 0.39.1 and 0.39.2 but with 0.40 this happened again. It usually happens after waking up the laptop (after opening the lid). Fortunately, probably thanks to https://github.com/netbirdio/netbird/pull/3614, if fixed itself after netebird down&up
2025-04-08T14:34:53+02:00 INFO client/internal/statemanager/manager.go:412: cleaning up state dns_state
2025-04-08T14:34:53+02:00 WARN client/server/server.go:590: failed to restore residual state: 1 error occurred:
* perform cleanup: 1 error occurred:
* dns_state: cleanup state: restore unclean shutdown dns: remove interface registry key: get interface registry key: open HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\{F2F29E61-D91F-4D76-8151-119B20C4BDEB}: Systém nemůže nalézt uvedený soubor.
2025-04-08T14:34:53+02:00 INFO client/internal/connect.go:122: starting NetBird client version 0.40.0 on windows/amd64
2025-04-08T14:34:54+02:00 INFO client/internal/engine.go:320: stopped Netbird Engine
@lixmal Happened already twice today. Down&up fixed it. Btw what is the preffered way to access private DNS? Should it be peer or network route (currently used). Debug enabled after this dump.
@lixmal with 0.40.+ the resolution of *.our.domain is broken after every laptop hibernation/sleep and can be fixed by disconnect/connect. it's really frustrating and practically unusable for our clients
Can you avoid using your top level domain in your match-domains and use subdomains instead? This way netbird.your.domain is not routed through the wireguard tunnel. Thats what i did and i never had an issue since.
The issue was in the configuration of the nameserver in Netbird, where the matchdomain was *.our.domain but the netbird management is running on netbird.our.domain. After removing *.our.domain from the match domain it never happened again - tested for two days but before it happened instantly everytime closing/opening the lid of laptop.
I too experienced this and while removing *.our.domain in "Match Domains" works. I think that this is a temporary solution. I think Netbird client should remove name resolution policy when it disconnects and respect system dns settings. Not sure if this is how other ZTNA works.