DNS fails to be controlled and/or reverted
Received two reports from a customer about DNS issues on Windows related to Firezone.
- https://firezonehq.slack.com/archives/C08KPQKJZKM/p1749053656472029
- https://firezonehq.slack.com/archives/C08KPQKJZKM/p1747746982240599
Version 1.4.14
This is a tracking issue to investigate and increase the robustness of our DNS controlling logic to prevent issues going forward.
I suspect we may find clues in Sentry.
This is reproducible for me on Windows 10:
- Connect Firezone
- Update the DNS server of the primary interface adapter to manual, set it to 1.1.1.1
- Observe all network connectivity is cut
Edition Windows 10 Pro
Version 22H2
Installed on 1/30/2025
OS build 19045.5917
Experience Windows Feature Experience Pack 1000.19061.1000.0
Can't seem to reproduce the above on Windows 11 Pro.
Logs from the above session
Can you reproduce with debug logs please?
In the above logs, the 1.1.1.1 address makes sense, because I set the addresses to Primary: 1.1.1.1 and Secondary: 1.0.0.1
However, the 168.63.129.16 address doesn't make any sense. Not sure where that's coming from.
Strangely, I'm able to reproduce the connectivity hang without the Firezone GUI running too, maybe the tunnel service is at play, or maybe this could be related to the VM and RDP.
It's very possible the issues users are facing here is due to #8439. Maybe Windows is reporting to us the DNS servers of other interfaces but with a higher metric, expecting us not to use them, but we are.
All other apps on windows use the "Primary" resolver unless it fails. I believe we round-robin among all the ones we "find", which could the explain the things here.
All other apps on windows use the "Primary" resolver unless it fails. I believe we round-robin among all the ones we "find", which could the explain the things here.
We don't round-robin anything, we map 1 to 1 and the OS picks which one to send queries to.
When we set our DNS servers though, do we respect the metric of the old ones?