headscale
headscale copied to clipboard
Multi server support
Feature request Multiple server support
A Way to have multiple server nodes so that we can have mesh and the user can connect to the nearest server and
I have multiple servers in different regions so it was better if i would have I node per region so the latency would be low and the load will also be low
Thanks!!
I have not tried it, but I suppose if you have a central database, sharing it across control servers may be possible. The downside is, doing that bypasses any application level locks, which may introduces race conditions around machine registration/ip address allocation.
But then again, since the control server should only provide information about a network, and not forward any traffic by itself, I am not sure it is worth optimizing them for latency. If my interpretation is correct, "latency" in this sense would be "how quickly would existing clients observe a new node joining the network".
In principle, the amount of traffic going between the control server (headscale) and the tailscale clients are not really affected by latency (unless its so bad it times out, but then I suspect you have other issues).
For DERP relays, lower latency could make sense, and you can host those separately from headscale.
There are scenarios where multiple servers would make sense and allowing them to connect would also make sense:
- Multiple headscale servers, allowing two companies/owners to share nodes between them
- Redundancy
- What consequences would nodes being shared across control servers have on ACLs and security in general? I think it can be done safely as long as the servers can keep track whence each tag/rule originates, but it sure sounds easier to screw up than a single central ACL.
- As for redundancy, I think moving all state (including any locks around db insertions) to the database server (how does gin handle transactions?) would allow the setup mentioned above, and it should be simpler than anything that would depend on control servers directly communicating.
Though even if the database could be shared, one also needs to ensure the ACLs and the DERP map remain consistent across control servers.
This may sound crazy, but how about removing hard dependencies on exact config files and datastores/schemas, and letting users write their own behaviour in some glue/scripting language? As long as APIs are provided to them, they can decide what ACLs exists (for updates, just ask their script again), they'll know if the rules they wish to give out are handcrafted, come from a config file, or a database, or generated on the fly. Same for nodes, instead of hardwiring the address allocation/node listing logic, call into their machine_register
, or machine_enumerate
functions - this way they can share nodes, set up their own machine registration logic (this would allow for any authentication machinery to be used, including external OIDC/SAML/Basic Auth/mTLS/Kerberos solutions, without the control server needing to care), share users in any manner they wish.
The upside is, the control server becomes a lot simpler and a lot more flexible. The downside is, scripting one's own DB access and the like is easier to screw up than relying on something shipped with the control server, and now the control server really needs to get the admin-facing API right. I think the former can be balanced out to a high degree by providing high quality samples and docs, but it is still more work for the user.
I definitely see the need for multisite/multiregion deployments for redundancy purposes (like nebula lighthouses).
One site dying, shouldn't take down the communication for the rest of them.
I would welcome it as well. Our admin stack is struggling with outages and the headscale VM is also crashing. High availability would be extremely desirable. We could have one headscale server in Canada and the second one in Switzerland. And if one of them goes down, for whatever reason, we can still continue to work. So far our master admin always has to fix the whole thing and work with a proxy that is not secured. We would like to prevent that. While Headscale is down, clients that have restarted can't connect and the work has to be down.
Hi @0n1cOn3 :)
The main objective of Headscale is to provide a correct implementation of the Tailscale protocol & control server - for hobbyists and self-hosters. We might work in the future to support HA setups, that's not the very short term goal.
Those kinds of requests I would recommend you the official Tailscale.com SaaS + Tailnet Lock.
Or send us a PR :) PRs are always welcomed!
Hi @juanfont
Thanks for your answer. Yes, we (n64.cc) are do self-hosting and wont reliable on others "computers".
Maybe I'm just asking too much 😂 I'm unfortunately not able to program, otherwise I would very much like to implement somehow and make a PR. But as a hobby system / cloud administrator, I'm almost left to the others who can program.
While we appreciate the suggestion, it is out of scope for this project and not something we will work for now.
Thanks for the answer @kradalby
Too bad, because HA for Tailscale would certainly be a groundbreaking possibility. Our community has unfortunately the problem that the main server with Tailscale random again and again says goodbye and therefore this idea arose. I would welcome it if this idea is implemented in a later time perhaps.
Thank you very much.
Maybe there would be the possibility, if Tailscale is down, that a client as standby can take over this task for authentication in a temporary period. At least for the already logged in clients. Only, I see some challenges in addressing this.
@0n1cOn3 What about using two vms in different availability zones/ datacenters, a floating/ virtual IP for headscale, and a local postgres master/ slave setup. Use keepalive to control the failover.
@0n1cOn3 What about using two vms in different availability zones/ datacenters, a floating/ virtual IP for headscale, and a local postgres master/ slave setup. Use keepalive to control the failover.
floating IPs work only in the same network. anyways, since clients use an FQDN to connect all you really need is:
- a health checker endpoint for headscale
- a cron job to sync database and configuration over to your slave server
- the headscale domain to be hosted to some DNS hoster with API access
- a script on both headscale servers to monitor the health of the other and change the A record of your headscale domain accordingly
Has anyone implemeted what @rallisf1 has outlined? What's your experience? Any pitfalls?
Has anyone implemeted what @rallisf1 has outlined? What's your experience? Any pitfalls?
Not yet. I have to speak with our main admin to test that.
@0n1cOn3 Were you able to try this idea?
@0n1cOn3 Were you able to try this idea?
Not yet. Our main admin has to perform this setup.