firezone icon indicating copy to clipboard operation
firezone copied to clipboard

DNS fails to be controlled and/or reverted

Open jamilbk opened this issue 6 months ago • 9 comments

Received two reports from a customer about DNS issues on Windows related to Firezone.

  • https://firezonehq.slack.com/archives/C08KPQKJZKM/p1749053656472029
  • https://firezonehq.slack.com/archives/C08KPQKJZKM/p1747746982240599

Version 1.4.14

This is a tracking issue to investigate and increase the robustness of our DNS controlling logic to prevent issues going forward.

I suspect we may find clues in Sentry.

jamilbk avatar Jun 04 '25 17:06 jamilbk

This is reproducible for me on Windows 10:

  1. Connect Firezone
  2. Update the DNS server of the primary interface adapter to manual, set it to 1.1.1.1
  3. Observe all network connectivity is cut

jamilbk avatar Jun 05 '25 04:06 jamilbk

Logs from the above session

firezone_logs_2025_06_05-04-52.zip

jamilbk avatar Jun 05 '25 04:06 jamilbk

Edition	Windows 10 Pro
Version	22H2
Installed on	‎1/‎30/‎2025
OS build	19045.5917
Experience	Windows Feature Experience Pack 1000.19061.1000.0

jamilbk avatar Jun 05 '25 05:06 jamilbk

Can't seem to reproduce the above on Windows 11 Pro.

jamilbk avatar Jun 05 '25 05:06 jamilbk

Logs from the above session

firezone_logs_2025_06_05-04-52.zip

Can you reproduce with debug logs please?

thomaseizinger avatar Jun 05 '25 06:06 thomaseizinger

In the above logs, the 1.1.1.1 address makes sense, because I set the addresses to Primary: 1.1.1.1 and Secondary: 1.0.0.1

However, the 168.63.129.16 address doesn't make any sense. Not sure where that's coming from.

Strangely, I'm able to reproduce the connectivity hang without the Firezone GUI running too, maybe the tunnel service is at play, or maybe this could be related to the VM and RDP.

It's very possible the issues users are facing here is due to #8439. Maybe Windows is reporting to us the DNS servers of other interfaces but with a higher metric, expecting us not to use them, but we are.

All other apps on windows use the "Primary" resolver unless it fails. I believe we round-robin among all the ones we "find", which could the explain the things here.

jamilbk avatar Jun 05 '25 07:06 jamilbk

All other apps on windows use the "Primary" resolver unless it fails. I believe we round-robin among all the ones we "find", which could the explain the things here.

We don't round-robin anything, we map 1 to 1 and the OS picks which one to send queries to.

thomaseizinger avatar Jun 07 '25 14:06 thomaseizinger

When we set our DNS servers though, do we respect the metric of the old ones?

jamilbk avatar Jun 07 '25 15:06 jamilbk