tailscale
tailscale copied to clipboard
MagicDNS interferes with non-tailscale traffic on cellular WAN (DNS64)
Describe the bug When tailscale is running with MagicDNS enabled, and when the client laptop is hotspotted to a phone with cellular WAN, non-tailscale traffic is interrupted (presumably because of DNS resolution issues...?).
To Reproduce Steps to reproduce the behavior:
- Hotspot your phone and tether your laptop to it.
- If tailscale has MagicDNS enabled, and if it's running, you will intermittently be unable to access non-tailscale network resources.
-
sudo tailscale down
andsudo systemctl stop tailscaled.service
restore access to other resources. - Disabling MagicDNS (but keeping tailscale running) also restores other network access.
Expected behavior tailscale shouldn't interfere with access to other network resources. For now I've disabled MagicDNS, but it's super handy, so I'd love to be able to use it!
Screenshots Just let me know if a video would be helpful -- happy to share.
Version information:
- Laptop: Ubuntu 20.04
- tailscale: tailscale commit: 64a9656c01754b6652994cb3a8ef59bce1246cfc, 1.4.4
- Cell phone: Pixel 5
- Provider: Google Fi
Additional context A similar behavior occasionally occurs when I switch WiFi networks, but generally this isn't a problem with wireline WAN.
When tethered, with tailscaled stopped or Magic DNS off, what's your DNS server that your Pixel gives out to your laptop? (What's /etc/resolv.conf say?)
I wonder if this MagicDNS is covering up a DNS server on your Pixel doing 464XLAT stuff.
I have an old Pixel + a spare Google Fi data SIM lying around somewhere so I should try.
/cc @danderson
This also happens with macOS and an iOS hostspot.
I tried to reproduce which combination of the following conditions causes the bug, but somehow by toggling and trying I got it working even if all are true at the same time, although I definitely had it fail while all were true before:
- macOS is thethering through iOS
- iOS has Tailscale enabled
- macOS has "Use corporate DNS" enabled
- Magic DNS is enabled
- the upstream DNS server is a Tailscale IP
When it's not working, DNS requests to 100.100.100.100 time out, except for .tailscale.net
requests, which work correctly.
When I disable corporate DNS, my phone hands out 172.20.10.1 as the DNS server.
@FiloSottile thank you for reproducing! Yep, I also see somewhat intermittent behavior here. Anecdotally, I think it might be worse when I'm on a lower quality cellular connection (H+ vs. LTE or 5G) -- though that could be coincidental, too.
@bradfitz, here's what I get when tethered to Pixel 5 using Google Fi (5G):
nrc@nrc-aero:~$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0 trust-ad
Also worth adding that the tailscale listed Nameservers may be a factor, too, not Magic DNS per se -- I turned both off as part of my workaround.
EDIT: Thinking it over more, it seems like my local tailscale client also wasn't able to initialize correctly? Before I set up the workaround, here's what I got when running tailscale status
. tailscale netcheck
looked normal though.
It says:
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
Can you run that too?
And then can you do a DNS lookup for a IPv4-only domain (such as bradfitz.com
) and see what it returns? Is it v6-ified?
Yep:
nrc@nrc-aero:~$ resolvectl status
Global
LLMNR setting: no
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNSSEC NTA: 10.in-addr.arpa
16.172.in-addr.arpa
168.192.in-addr.arpa
17.172.in-addr.arpa
18.172.in-addr.arpa
19.172.in-addr.arpa
20.172.in-addr.arpa
21.172.in-addr.arpa
22.172.in-addr.arpa
23.172.in-addr.arpa
24.172.in-addr.arpa
25.172.in-addr.arpa
26.172.in-addr.arpa
27.172.in-addr.arpa
28.172.in-addr.arpa
29.172.in-addr.arpa
30.172.in-addr.arpa
31.172.in-addr.arpa
corp
d.f.ip6.arpa
home
internal
intranet
lan
local
private
test
Link 6 (tailscale0)
Current Scopes: none
DefaultRoute setting: no
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Link 5 (docker0)
Current Scopes: none
DefaultRoute setting: no
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Link 3 (wlo1)
Current Scopes: DNS
DefaultRoute setting: yes
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 192.168.36.16
DNS Servers: 192.168.36.16
2607:fb90:806b:12ca::8e
DNS Domain: ~.
Link 2 (enp2s0)
Current Scopes: none
DefaultRoute setting: no
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
And yes, seems like it is v6-ified:
nrc@nrc-aero:~$ ping bradfitz.com
PING bradfitz.com(2607:7700:0:1c:0:1:23f7:3475 (2607:7700:0:1c:0:1:23f7:3475)) 56 data bytes
64 bytes from 2607:7700:0:1c:0:1:23f7:3475 (2607:7700:0:1c:0:1:23f7:3475): icmp_seq=1 ttl=52 time=191 ms
64 bytes from 2607:7700:0:1c:0:1:23f7:3475 (2607:7700:0:1c:0:1:23f7:3475): icmp_seq=2 ttl=52 time=163 ms
^C
--- bradfitz.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 163.163/176.832/190.501/13.669 ms
nrc@nrc-aero:~$ host bradfitz.com
bradfitz.com has address 35.247.52.117
bradfitz.com has IPv6 address 64:ff9b::23f7:3475
bradfitz.com mail is handled by 50 aspmx3.googlemail.com.
bradfitz.com mail is handled by 30 alt2.aspmx.l.google.com.
bradfitz.com mail is handled by 10 aspmx.l.google.com.
bradfitz.com mail is handled by 40 aspmx2.googlemail.com.
bradfitz.com mail is handled by 20 alt1.aspmx.l.google.com.
However, it's worth highlighting that I'm now in SF with pretty good cellular WAN (this originally came to my team's attention while running in more remote areas in CA and possibly in Australia as well), and after flipping the various toggles here a bunch of times, I am still seeing the behavior but much less often.
I do have a screencast I recorded in a more remote area yesterday though, where the behavior was deterministic, if that would be helpful.
Thanks! That's enough info. It's pretty clear what's happening now.
Great, thanks for taking a look so fast!
Just following up from Twitter:
I'm experiencing the same symptoms as @FiloSottile from a network that is an T-Mobile LTE modem -> Google WiFi router -> {macOS, iPhone} without doing any kind of tethering.
Some links:
- https://sites.google.com/site/tmoipv6/464xlat
- https://dan.drown.org/android/clat/
- https://en.wikipedia.org/wiki/IPv6_transition_mechanism#DNS64 / https://developers.google.com/speed/public-dns/docs/dns64
I think what I'm seeing might not be the same issue (in which case, happy to open a separate one).
Today I was connected through my iOS hotspot. Magic DNS was active and the upstream was 100.74.42.19
.
I switched "Use corporate DNS" on.
-
dig example.com
(as well as general connectivity) started timing out -
dig foo.filippo.io.beta.tailscale.net
still worked -
dig @100.74.42.19 example.com
timed out at first -
ping 100.74.42.19
worked -
dig @100.74.42.19 example.com
worked after the ping -
dig example.com
was still broken -
dig damogran.filippo.io.beta.tailscale.net
still worked
Then I turned "Use corporate DNS" off, turned Magic DNS off, and turned "Use corporate DNS" back on.
As expected, dig example.com
was now working, contacting 100.74.42.19
.
It looks like the problem was specifically with the Magic DNS daemon contacting 100.74.42.19
, but I can't explain why at first a direct query timed out as well.
My hotspot DNS was NOT doing DNS64.
@FiloSottile I think the issue you were running into is https://github.com/tailscale/tailscale/issues/2224, during this timeframe MagicDNS wasn't able to send DNS queries to 100.x.y.z addresses due to echo-killing rules in several platforms. That was fixed, using Tailscale addresses as DNS servers works on all platforms as of 1.10.x.
Leaving the issue open as some of the earlier comments appear to be a different problem possibly involving 464XLAT.
https://github.com/tailscale/tailscale/issues/1634 and https://github.com/tailscale/tailscale/issues/1377 are somewhat similar in that they need a way for MagicDNS to detect that it is in an environment where the set of upstream DNS servers it has been configured to use cannot possibly work. We might consider solving this similar to what browsers do (like with Chrome's redirect204):
- periodically, when there is some other DNS lookup to be done, also lookup a name where we absolutely know what the answer is supposed to be
- if the answer comes back different, use that to figure out if we're in a captive portal environment or DNS64 environment
We face this problem, and settled on RFC7050 well-known address, ipv4only.arpa
, to figure out if underlying networks rely on DNS64 to NAT v4 traffic over v6 (ref).