[Bug] Headscale serving subnet from offline node.
Is this a support request?
- [x] This is not a support request
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
Headscale is serving routes from an offline node:
The nodes list command output is:
ID | Hostname | Name | MachineKey | NodeKey | User | IP addresses | Ephemeral | Last seen | Expiration | Connected | Expired
3 | <redacted> | <redacted> | [0eCEd] | [/eEPF] | internal | 100.65.0.4, fd7a:115c:a1e1::4 | false | 2025-08-22 10:04:40 | N/A | online | no
4 | <redacted> | <redacted> | [1T/pb] | [n9Srg] | cassa | 100.65.0.5, fd7a:115c:a1e1::5 | false | 2025-08-22 09:45:02 | N/A | online | no
6 | <redacted> | <redacted> | [lrXNR] | [yhjhi] | cassa | 100.65.0.7, fd7a:115c:a1e1::7 | false | 2025-07-04 08:17:20 | 1970-01-01 00:02:03 | offline | yes
7 | <redacted> | <redacted> | [Cpdpz] | [f6fqC] | cassa | 100.65.0.1, fd7a:115c:a1e1::1 | false | 2025-08-22 10:05:40 | N/A | offline | no
8 | <redacted> | <redacted> | [vqbGX] | [N+zcw] | cassa | 100.65.0.6, fd7a:115c:a1e1::6 | false | 2025-08-22 10:03:27 | N/A | offline | no
9 | <redacted> | <redacted> | [MNbbI] | [88f3J] | cassa | 100.65.0.8, fd7a:115c:a1e1::8 | false | 2025-08-22 10:12:26 | N/A | online | no
10 | <redacted> | <redacted> | [WwX6Q] | [WEaUd] | cassa | 100.65.0.9, fd7a:115c:a1e1::9 | false | 2025-08-22 10:08:11 | N/A | online | no
11 | <redacted> | <redacted> | [xkeLX] | [5XTVi] | cassa | 100.65.0.11, fd7a:115c:a1e1::b | false | 2025-08-22 10:10:01 | N/A | online | no
12 | <redacted> | <redacted> | [Cv2I7] | [g5cTw] | cassa | 100.65.0.12, fd7a:115c:a1e1::c | false | 2025-08-22 10:13:49 | N/A | offline | no
13 | <redacted> | <redacted> | [sbrlf] | [YQdPd] | cassa | 100.65.0.13, fd7a:115c:a1e1::d | false | 2025-08-22 10:13:05 | N/A | offline | no
14 | <redacted> | <redacted> | [elQmK] | [2sMcI] | cassa | 100.65.0.16, fd7a:115c:a1e1::10 | false | 2025-08-22 09:24:10 | N/A | offline | no
The routes list response is:
ID | Hostname | Approved | Available | Serving (Primary)
4 | <redacted> | 192.168.200.80/32 | 192.168.200.80/32 | 192.168.200.80/32
6 | <redacted> | 192.168.2.82/32 | 192.168.2.82/32 |
7 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
8 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
9 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
10 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
11 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
12 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
13 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 |
14 | <redacted> | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32 | 192.168.3.44/32, 192.168.3.60/32, 192.168.3.63/32, 192.168.3.68/32
As you can see the node 14 is serving the primary route, but the node 14 is marked as "offline".
Expected Behavior
The serving routes switches to an online node.
Steps To Reproduce
- Approve the same route on multiple nodes.
- Put a node offline.
Environment
- OS: Debian
- Headscale version: 0.26.1
- Tailscale version: Latest
Runtime environment
- [x] Headscale is behind a (reverse) proxy
- [ ] Headscale runs in a container
Debug information
The policy allows all traffic. Headscale has the default configuration.
The serving routes switches to an online node.
How went the node 14 offline (os shutdown, stop tailscaled, poweroff, …)? Do you see anything related in the logs for node 14? Does the switch to another node work if you "properly" stop tailscaled on node 14?
You might be hit by this: https://headscale.net/development/ref/routes/#high-availability
Hi,
the node was not properly shutdown, but, from the headscale nodes list command output, headscale seems to detect the node as offline. The other node that was not properly shutdown required a few minutes to start beign detected as offline.
the node was not properly shutdown, but, from the headscale nodes list command output, headscale seems to detect the node as offline. The other node that was not properly shutdown required a few minutes to start beign detected as offline.
Can you check if node switching is fast when the primary node is properly shutdown? It should be fairly quick.
It seems to be related to: #2129
Do you have the opportunity to test main? The logic has changed a bit, and might have improved with latest changes.
@pupaxxo It'd be great if you could test with 0.27.0-beta.1.
Hi! Sorry for the dalayed response, the system is actually beign used and it's not so easy to test, I'l try to schedule a live-test with the customer in the next days.
It should work in 0.27.1. Let us know if you still find this issue in your environment.