firezone icon indicating copy to clipboard operation
firezone copied to clipboard

feat(gateway): resolve DNS in the background

Open thomaseizinger opened this issue 1 year ago • 7 comments

When receiving a connection request for a DNS resource, the gateway will first resolve the domain name and only then respond with the ICE credentials needed by the client to establish the connection. This was necessary because prior to #4994, the response sent to the client included the resolved IPs for the domain name.

With #4994, the gateway only needs to resolve the domain name in order to correctly map the client's proxy IPs upon incoming traffic. As a result, we can remove the DNS resolution from the hot-path of connection setup and always directly accept a client's connection request.

Currently, if the DNS resolution fails, we never respond to the client, thus forcing it into a timeout to try and establish a new connection. Now, we always accept the connection and perform the DNS resolution in the background. If that fails, we return an empty list of IPs. This would result in the gateway not performing any translation.

The gateway already has functionality to refresh a DNS mapping if we have seen outgoing traffic for a proxy IP but no incoming traffic. We can extend this to also refresh DNS if there isn't even a mapping for the given proxy IP. This way, a failed DNS resolution as part of an allow or connection request is self-healing as soon as the client starts using the proxy IP.

thomaseizinger avatar Jun 20 '24 02:06 thomaseizinger

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
firezone ⬜️ Ignored (Inspect) Visit Preview Jun 25, 2024 0:31am

vercel[bot] avatar Jun 20 '24 02:06 vercel[bot]

If DNS resolution on the gateway doesn't work reliably, almost nothing of firezone works. The only difference with this PR is that a failed connection (i.e. behaviour on main) is self-healing after ~20s whereas with this change the user would have to sign-out and in again in order to clear all connection state.

We can improve on this with some refactoring: The gateway already has functionality to refresh a DNS query if it sees packets for a proxy IP but no traffic for the translated IP. We can adopt this to also refresh the DNS query if we see traffic for a proxy IP and don't even have a real IP for this domain.

thomaseizinger avatar Jun 20 '24 02:06 thomaseizinger

Terraform Cloud Plan Output

Plan: 15 to add, 23 to change, 2 to destroy.

Terraform Cloud Plan

github-actions[bot] avatar Jun 20 '24 02:06 github-actions[bot]

If DNS resolution on the gateway doesn't work reliably, almost nothing of firezone works. The only difference with this PR is that a failed connection (i.e. behaviour on main) is self-healing after ~20s whereas with this change the user would have to sign-out and in again in order to clear all connection state.

We can improve on this with some refactoring: The gateway already has functionality to refresh a DNS query if it sees packets for a proxy IP but no traffic for the translated IP. We can adopt this to also refresh the DNS query if we see traffic for a proxy IP and don't even have a real IP for this domain.

I just attempted to implement this, I think it is actually reasonably clean.

thomaseizinger avatar Jun 20 '24 02:06 thomaseizinger

Performance Test Results

TCP

Test Name Received/s Sent/s Retransmits
direct-tcp-client2server 233.1 MiB (-0%) 234.3 MiB (-1%) 238 (-4%)
direct-tcp-server2client 238.5 MiB (+3%) 239.9 MiB (+2%) 66 (-71%)
relayed-tcp-client2server 233.4 MiB (-4%) 234.4 MiB (-4%) 298 (-14%)
relayed-tcp-server2client 232.1 MiB (-3%) 234.0 MiB (-2%) 565 (-7%)

UDP

Test Name Total/s Jitter Lost
direct-udp-client2server 500.0 MiB (+0%) 0.05ms (-3%) 44.03% (-7%)
direct-udp-server2client 500.0 MiB (+0%) 0.01ms (-71%) 23.73% (+8%)
relayed-udp-client2server 500.0 MiB (+0%) 0.03ms (-48%) 54.96% (+2%)
relayed-udp-server2client 500.0 MiB (-0%) 0.01ms (-51%) 37.44% (+1%)

github-actions[bot] avatar Jun 20 '24 04:06 github-actions[bot]

If we wait with this PR until the new clients are released, we can make it substantially simpler by not worrying about backwards-compatibility!

thomaseizinger avatar Jun 20 '24 06:06 thomaseizinger

This is blocked on being able to remove the backwards-compatibility layer for clients < 1.1. We could theoretically ship it earlier but it is a lot simpler to just delay this until we are happy to no longer support < 1.1 clients on newer gateways.

thomaseizinger avatar Aug 20 '24 08:08 thomaseizinger

Will essentially be replaced by https://github.com/firezone/firezone/pull/6732.

thomaseizinger avatar Sep 24 '24 09:09 thomaseizinger