talos
talos copied to clipboard
Build-in load balancer
Feature Request
Description
Load balancers are ubiquitous in cloud environments but not standardized and manual work in on-premise setups. Hence letting Talos handle this requirement internally would relieve a significant burden in setting up on premise clusters with Talos. See #2711 for a partial rationale.
Current solutions
Currently (v0.14) the available solutions in Talos are:
- Use an external dedicated load-balancer (service)
- Creates a cluster dependency on another system or provider.
- No hosted services are really available if your are not hosting all your nodes on the same cloud where you also want to use the load balancer from.
- Self hosting (HAProxy, NGINX, F5, etc) adds additional hardware, software and maintenance requirements and is work to do right.
- DNS records
- Round robin DNS assumes clients re-query periodically. Most CNI components do not and stick to the first IP during startup.
- Does not provide failover.
- Does not provide high availability.
- Layer 2 shared IP #3111
- Provides failover but no load balancing.
- High availability but (relative) slow failovers.
- Requires promiscuous mode allowing gratuitous ARP broadcasts. Not always allowed or possible in stretched VPCs.
- On premise multi datacenter setups require custom network setups (and some hardware) to enable layer 2 spanning.
- Tentative: VIP over BGP
- Can be done with
MetalLBorkube-vipand some dedication. - No build in option as of yet. #4334
- Can be done with
- Tentative: IPVS
- For now only a mention in #2711
Suggested additional solution
A better solution would be to add Talos native load balancing on the node side. This would:
- Be a CNI agnostic solution.
- Allow native build-in load balancing without any external software, hardware requirements or other dependencies.
- Allow self hosting without any maintenance burden other than the already required Talos config.
- Enable faster failovers than the current solutions.
- Enable true high availability given at least three control planes nodes.
- Not be dependent on ARP or BGP and thus work:
- In every stretched VPC also those not supporting/allowing promiscuous adapters.
- In any multi datacenter setup without a spanned layer 2 broadcast domain or BGP setup.
- Without any extra infrastructure components like
MetalLBorkube-vip
- Allow the kube API (and possibly also the load balancer) to only be exposed to localhost?
- Given kubespan, this would work in all of the above scenarios without even publicly exposing the API endpoint even before a CNI is online.
Possible implementation
- Use parts of the client side load balancing code to create a Talos node side load balancer for worker and control plane nodes.
- Bind the load balancer on every node on localhost and set the kubernetes API server to 127.0.0.1:someport
- Optionally expose the load balancer port to the public on the control plane nodes as well?
Possible usage
Instead of running
talosctl gen config <cluster name> <cluster endpoint>
One would specify three <cluster endpoints>.
Instead of running
talosctl gen config <cluster name> <endpoint1>,<endpoint2>,<endpoint3>
This would result in the actual cluster.controlPlane.endpoint to be set to https://127.0.0.1:someport with a native local load balancer behind it balancing all requests to all three actual endpoints.
All three endpoints would of course still be DNS names so that no worker config would need to be changed if a control plane ever changes its IP.
Ecosystem
Not accidentally more people have run into this speedbump setting up an on premise cluster. In fact, there is a large thread for just this: don't require a load balancer between cluster and control plane and still be HA and some partial fixes: KEP-3037: client-go alternative services. However, as there are many moving parts it will probably take a long time to get support for multiple API endpoints natively. Furthermore it will taken even more years before all components of all CNI's have support for this. Hence it would be better to build it into Talos now and, as a feature of Talos, remove the load balancer requirement.
As an example solution there is Rancer's RKE implementation. Their solution by default:
- Points the
kubeletto127.0.0.1:6443. - Points
kube-proxyto127.0.0.1:6443. - Runs a (assuming static pod)
nginx-proxycontainer on port 6443. - Ties in the NGINX target health checks into their ecosystem.
Some related thoughts
- Talos node native load balancing could use the API target health checks to inform the boot process of API availability possibly reducing the many "try but fail will retry" errors on boot to a single "waiting for network connectivity to an API server" message.
- Talos could integrate the API health checks from multiple viewpoints into the existing health checks and logging framework and report better on transient API server failures. #4088
- Talos could intelligently load spread request. Say you have three sites with each a single API server, it could slightly prefer to send API request to the local (lower detected latency in load balancer logic) API server.
- Maybe it's better to keep the API server variable set to a fqdn but add that fqdn to the hosts file to resolve it to localhost in case that variable gets exposed to some external services? See the workaround solution.
- Maybe we could choose not to expose the kubernetes API by default but have the option of letting talosctl proxy that port on demand. Much like we do with kubectl when we need to connect to a port on a pod. It would reduce the attack service to just the Talos API at rest.
- This approach might also solve any caching issues? #4470
- Combining this approach with kubespan one could see binding the actual API server to only localhost and the kubespan adapter. Then relegating external API access to the port exposed by the load balancer and any possible firewall rules on that. #4421 #1417 #4898
- A load balancer might provide a level of stability and flexibility removing the need for a VIP or BGP solution and their inherent complexities. #4604 #4334
Current workaround
For anybody who wants a similar solution right now you can:
Click to expand!
Instructions
- Determine your API server fqdn but set a different port than 6443. For example
https://kube.mycluster.mydomain.com:6444. - Add this fqdn to the host file.
This way it always resolves to localhost on the worker and control plane nodes but not on any external services that have somehow received that fqdn config.machine: network: extraHostEntries: - ip: 127.0.0.1 aliases: - kube.mycluster.mydomain.com - Add a static HAProxy pod to run a mini load balancer on every node.
Note that:machine: files: - path: /etc/kubernetes/manifests/kubernetes-api-haproxy.yaml permissions: 0o666 op: create content: | apiVersion: v1 kind: Pod metadata: name: kubernetes-api-haproxy namespace: kube-system spec: hostNetwork: true containers: - name: kubernetes-api-haproxy image: haproxy livenessProbe: httpGet: host: localhost path: /livez port: 6445 scheme: HTTP volumeMounts: - name: kubernetes-api-haproxy-config mountPath: /usr/local/etc/haproxy/haproxy.cfg readOnly: true volumes: - name: kubernetes-api-haproxy-config hostPath: path: /etc/kubernetes/manifests/haproxy.cfg type: File - path: /etc/kubernetes/manifests/haproxy.cfg permissions: 0o666 op: create content: | global log stdout format raw daemon defaults log global option tcplog option http-keep-alive timeout connect 3s timeout client 1h timeout server 1h timeout tunnel 1h timeout client-fin 1m timeout server-fin 1m retries 1 email-alert mailers mailservers email-alert from [email protected] email-alert to [email protected] email-alert level notice mailers mailservers mailer yourdomaintld your.mailserver.com:25 frontend kube mode tcp bind :6444 default_backend kubes backend kubes mode tcp balance roundrobin option httpchk GET /readyz http-check expect status 200 default-server verify none check check-ssl inter 2s fall 2 rise 2 server kube1 kube1.mycluster.mydomain.com:6443 server kube2 kube1.mycluster.mydomain.com:6443 server kube3 kube1.mycluster.mydomain.com:6443 frontend stats mode http bind :6445 monitor-uri /livez default_backend stats backend stats mode http stats refresh 5s stats show-node stats show-legends stats show-modules stats hide-version stats uri /- As of v0.15/v1.0 there is now a better way to add a static pod to the Talos config.
- The haporxy.cfg file is placed in the wrong directory resulting in an (otherwise harmless) error message on boot.
- You could define a different email alert sender per node or not.
- Check on every node's 6445 port for its view on API availability.
- Do not forget to set the A records of the fqdn for external visitors to all three control plane nodes (or a load balancer running on kubernetes if you need HA from an external client as well).
by the way Talos runs Kubernetes control plane components pointed to the localhost:6443 for the API server endpoint, so they don't require load-balancer to be up
I'm interested in this as well. Mostly for HA control plane without an external LB.
I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the Kubernetes API Server In-Cluster Load Balancer section. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included.
Good to see Talos OS growing. Exiting times!
I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the
Kubernetes API Server In-Cluster Load Balancersection. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included. Good to see Talos OS growing. Exiting times!
This feature is in-cluster exclusively, it makes sure the cluster can run even if the external load-balancer is down (or it might prefer local traffic if the external load-balancer has higher latency).