wireguard-go
wireguard-go copied to clipboard
tun: replace the DNS client in netstack with net.Resolver
Hello,
I used the new netstack package to setup an in-process tunnel for tunneling SSH connections over WireGuard. It worked quite well, I was very happy with how easy it was. Here are some changes I made as part of my work. The biggest one is replacing the custom DNS client with an instance of net.Resolver.
The TUN type includes a DNS client for sending hostname lookup queries to the DNS server(s) over the WireGuard connection. The builtin client resolver in the net package also uses the dnsmessage package to implement a DNS client, which can bypass the servers specified in resolv.conf when a custom Dial func is provided. This is a patch to replace the netstack package's DNS client with an instance of net.Resolver configured to connect to the specified servers instead of the system's configured DNS servers (via resolv.conf). It was tested on and with support from Fly.io.
Also included is a change to update the modules in the netstack package and fixes for the client & server examples, along with a change to add a copyright file to the netstack package to fix https://pkg.go.dev/ support.
Thanks for the patches.
First, some house keeping -- this repo uses Signed-off-by:
lines. Can you add yours using git commit -s --amend --no-edit
?
With regards to the DNS change, I'm a bit lost in the diff, but my current understand is:
- net.Resolver lets you override the Dial function.
- You use this to hijack the dialing, replacing the DNS server passed by net with the one specified by the user for tnet.
- If the user doesn't specify any tnet resolvers, you allow the ones passed by net.Resolver to be used.
Is this more or less what's happening? If so, I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf. Will net.Resolver still pass an IP to be hijacked to the Dial function, or will it just fail? What is the behavior when a tnet has 1 DNS server, but the system specifies multiple DNS servers? Will tnet's 1 server be queried multiple times?
Thanks, I added the signoffs, I also noticed some wonkiness with go mod tidy
, it seems to make a change that breaks go build
for me. Still trying to track down what's going on there.
I realized I forgot to document some of the behavioral changes this PR introduces. It's not easy to describe without an excruciatingly long explainer on how DNS works in Go apps, so here goes.
The net package has two ways of looking up a hostname: native OS functions/syscalls or pure Go resolver. Since the host is unaware of the netstack networking layer, if lookups are handled by the OS they will not be sent over the WireGuard connection, and internal names won't be resolved. So for any new connections over the WireGuard tunnel, it makes sense to also tunnel DNS queries. Using a net.Resolver with PreferGo set and Dial set to tnet's will force this behavior, but unfortunately that's not the end of it.
Even with the pure Go resolver configured, the local DNS config (e.g. resolv.conf, nsswitch.conf) is still used to determine the DNS server to contact. (If no config is present, I believe it default to "localhost:53".) This can cause the pure Go resolver to lookup a hostname over the WireGuard tunnel, but attempt to contact the server specified in the host config. On my machine, it was attempting to send DNS queries to 192.168.1.1:53, which was not part of the allowed ips address range.
So the dialDNS method attempts to correct this by swapping out the DNS server IP supplied by the system config for the DNS server IP(s) specified in the CreateNetTUN parameter. Unfortunately, the net package does not expose the DNS server(s) it discovers via reading system configs, so this dialDNS method instead replaces any connection attempts with a random IP in dnsServers, unless the address is already in dnsServers. (The range over a map makes the order random.)
The previous version of Dial was resolving all IPs for the address with the custom resolver implemented in LookupContextHost. That version would query each DNS server in dnsServers until it received a valid response. This version uses the net.Resolver to do the lookup, but it overrides the IP address that net.Resolver attempts to dial, via the dialDNS method. This causes a subtle change in behavior: LookupContextHost returns the first valid record from a DNS server, dialDNS returns the first valid connection to a DNS server. If the record cannot be resolved by this server, it won't attempt to query another server. That behavior could be preserved by adding a wrapper around the call to tnet.resolver.LookupHost, if it is important to keep.
Let me know if you would like any of these details documented in the commit message.
Thanks, I added the signoffs, I also noticed some wonkiness with go mod tidy, it seems to make a change that breaks go build for me. Still trying to track down what's going on there.
gVisor has a separate branch for use with real Go, since they have some forked Go. I fixed things up in slightly different ways for both the pkgsite license and the api bump. Thanks for bringing that all to my attention.
I realized I forgot to document some of the behavioral changes this PR introduces. It's not easy to describe without an excruciatingly long explainer on how DNS works in Go apps, so here goes.
Okay, so your explanation matches my three bullet points. I think we're on the same page. I worry, though, isn't there a problem with:
So the dialDNS method attempts to correct this by swapping out the DNS server IP supplied by the system config for the DNS server IP(s) specified in the CreateNetTUN parameter. Unfortunately, the net package does not expose the DNS server(s) it discovers via reading system configs, so this dialDNS method instead replaces any connection attempts with a random IP in dnsServers, unless the address is already in dnsServers. (The range over a map makes the order random.)
I asked about a corner case of this in my prior message when I wrote:
I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.
It looks like in that case, lookups will fail. But lookups shouldn't fail for that reason.
So as much as I'd like to stop open coding DNS resolver logic in this module and use something in the library, it doesn't look like Go's resolver is flexible enough to make that work. I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?
I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.
127.0.0.1:53
& [::1]:53
are used when resolv.conf
doesn't include a nameserver or when there is an error reading the config.
I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?
I think hijacking the internal resolver would be overstepping if it forces all DNS lookups for the process to go over the tunnel. And you can already do this (minus loading the resolv.conf config) by modifying net. DefaultResolver
. Specifying a config for a net.Resolver
would be great, but I think it's been avoided b/c of how a lot of the configuration can be platform specific.
I'm wondering what happens when there are no system resolvers specified in /etc/resolv.conf.
127.0.0.1:53
&[::1]:53
are used whenresolv.conf
doesn't include a nameserver or when there is an error reading the config.
Oh, great! That's very promising. So we always get some request. Two questions:
- How's this work on Windows, or non _unix platforms?
- What happens if the tnet has more DNS servers than resolvconf? I see that currently you try dialing all of them, for each call to dialDNS, but this has two problems:
- UDP dialing won't fail fast like TCP does, so it'll always return a valid conn and not error out.
- If it does error, then it means that dialDNS might be called for the next resolv.conf address, in which case you now are making N*M connections.
I briefly looked into what it'd take to hijack Go's internal resolver logic using unsafe, but it looks like there's a lot of build tags and such in there complicating it. I wonder if this would be a good proposal for Go 1.17?
I think hijacking the internal resolver would be overstepping if it forces all DNS lookups for the process to go over the tunnel. And you can already do this (minus loading the resolv.conf config) by modifying
net. DefaultResolver
. Specifying a config for anet.Resolver
would be great, but I think it's been avoided b/c of how a lot of the configuration can be platform specific.
Yea, I wasn't thinking hijacking DefaultResolver, but just an instance of Resolver. But it looks like resolverConfig might be global anyway, which is a bummer.
- How's this work on Windows, or non _unix platforms?
I don't think it would, because AFAIK none of them support the PreferGo
stuff.
- UDP dialing won't fail fast like TCP does, so it'll always return a valid conn and not error out.
Yes, but if subsequent lookup fails with a "temporary" DNS error (like a timeout), the resolver will redial then resend the query, until it runs out of servers. But dialDNS could pick the same server on a redial.
- If it does error, then it means that dialDNS might be called for the next resolv.conf address, in which case you now are making N*M connections.
The max is 3*len(dnsServers)
connections for TCP and 3
for UDP, because the dnsConfig
allows at most 3 servers.
None of this seems ideal, so perhaps a better approach is to keep the DNS lookups handled by Net
, but to use a github.com/miekg/dns
client?
After thinking a bit more about this a bit more, I'm no longer in favor of using a different DNS client. What I really want is a way to override the LookupHost
method, if I so choose. That way there doesn't have to be a DNS server running on the other side of the connection for internal DNS to work (e.g. tunnel DOH instead). Using a different DNS client is a very round-about way of allowing for this.
What do you think about extracting the DNS client here into an exported type with just a LookupHost(context.Context, string) ([]string, error)
method, and swapping the dnsServers
argument for a LookupHost
func argument? That way, a net.Resolver.LookupHost
could be used in place of this client. Or just export a TunnelDNSTo(dnsServers []net.IP) func(....)
func instead of the entire DNS client type.
Hi,
Not sure if it's right place to chime in. I also want to config the netstack to use DoH/DoT.
Something that has changed over the time. PreferGo
is supported on Windows since go1.19. Though I don't think we must use net.Resolver.
How about adding a resolver interface, like LookupHost(ctx context.Context, host string) (addrs []string, err error)
, and allow it to be passed to CreateNetTUN
. To allow the resolve traffic to go through netstack, the DialContext probably should be added to that LookupHost
function as well.