headscale
headscale copied to clipboard
Sporatic DNS Failures
Created per request on Discord. I have no clue where to start debugging an issue that occurs with a frequency > 1 month. :-|
I have seen several cases (> 4) where devices on my headscale network rely on (magic)DNS provided by headscale, (which is being provided by PiHole) but it stops working, resulting in machines having no outgoing DNS (by extension any internet) at all. The Headscale server maintains DNS access to the outside internet.
Potential fixes: Full system restart Restarting Headscale Restarting PiHole
Background information: I have been running PiHole on a different server for about 2 years and have had no connectivity issues that I am aware of. My hunch is that there may be a conflict/(???) that occasionally severs the DNS connection between headscale and PiHole, and since Headscale doesn't seem to be aware of any upstream DNS server, all tailscale clients connected to it stop resolving DNS requests. These servers do have IP based network connectivity as well as inbound connections working, they just fail to resolve DNS requests and effectively have no outbound internet connection. (Symptom: docker pull results in 404 type errors and all sorts of other strange errors)
Context info Headscale v0.15.0 via docker-compose. Nginx reverse proxy PiHole with 2-3 upstream DNS providers Servers: Multiple distros, multiple hosts, mostly Debian based Upstream DNS: Cloudflare and quad9
Time frame: I have been running headscale for about 2-3 months and have seen this occur perhaps 4 times. The last gap was at least 1-1.5 months between occurrences. How I realize this is happening: People start messaging me asking if [service/server] is down. I log in and see it up but not functioning.
It seems that the Tailscale client will override the system's DNS settings on macOS.
When I enabled Use TailScale DNS Settings
and queried the DNS via the dig command, I clearly saw that the DNS servers were using the nameserver set in HeadScale.
When I disable Use Tailscale DNS Settings
and query the DNS with the dig command again, I see that the DNS server uses the nameserver set in System Preferences.
Enabled Use Tailscale DNS settings
~ » dig google.com
; <<>> DiG 9.10.6 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1497
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
+ google.com. 60 IN A 8.7.198.46 // the real ip respond from cloudflare
;; Query time: 13 msec
+ ;; SERVER: 1.1.1.1#53(1.1.1.1) // cloudflare server - Preferred namesever set in my headscale server
;; WHEN: Wed Jul 27 22:46:39 CST 2022
;; MSG SIZE rcvd: 54
Disabled Use Tailscale DNS settings
~ » dig google.com
; <<>> DiG 9.10.6 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42052
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
+ google.com. 1 IN A 198.18.1.105 // the fake ip respond from my homelab server
;; Query time: 10 msec
+ ;; SERVER: 10.0.0.2#53(10.0.0.2) // the nameserver set in my homelab server.
;; WHEN: Wed Jul 27 22:46:53 CST 2022
;; MSG SIZE rcvd: 55
Regarding this, I suggest you check if the device where your other Tailscale clients are located can access the preferred nameserver set in Headscale the next time you have a problem. I'm afraid the problem is caused by your other devices not being able to connect to the preferred DNS nameserver you set in Headscale.
@GrahamJenkins can you try with the latest v0.16.0?
I seem to have this problem and have been able to reliably recreate it, albeit it might not be exactly the same cause.
I've got my domain (gurucomputing.com.au
) routed via headscale's split dns through my router's tailscale tunnel. Whenever I reboot, I seem to hit a chicken and egg condition where tailscale tries to reach my headscale server via the tailscale tunnel and gets stuck in limbo. Tailscale's status even says "please restart the tailscale service".
Once I do restart the service, internet gets happy again. Can also confirm that commenting out the domain that includes the headscale server resolves the issue.
I don't think tailscale expects the coordination server to sit behind the tailscale service. solution is probably to exclude the explicit headscale server dns address from the split dns if it's included in the scope of the split dns, if that's possible. or just use a different domain for headscale vs tunneled services.
I will close this with two notes:
- Please try the new versions
- We cannot support connecting to headscale over the tunnel. (I believe this is not supported by tailscale either).