talos Build-in load balancer

Feature Request

Description

Load balancers are ubiquitous in cloud environments but not standardized and manual work in on-premise setups. Hence letting Talos handle this requirement internally would relieve a significant burden in setting up on premise clusters with Talos. See #2711 for a partial rationale.

Current solutions

Currently (v0.14) the available solutions in Talos are:

Use an external dedicated load-balancer (service)
- Creates a cluster dependency on another system or provider.
- No hosted services are really available if your are not hosting all your nodes on the same cloud where you also want to use the load balancer from.
- Self hosting (HAProxy, NGINX, F5, etc) adds additional hardware, software and maintenance requirements and is work to do right.
DNS records
- Round robin DNS assumes clients re-query periodically. Most CNI components do not and stick to the first IP during startup.
- Does not provide failover.
- Does not provide high availability.
Layer 2 shared IP #3111
- Provides failover but no load balancing.
- High availability but (relative) slow failovers.
- Requires promiscuous mode allowing gratuitous ARP broadcasts. Not always allowed or possible in stretched VPCs.
- On premise multi datacenter setups require custom network setups (and some hardware) to enable layer 2 spanning.
Tentative: VIP over BGP
- Can be done with MetalLB or kube-vip and some dedication.
- No build in option as of yet. #4334
Tentative: IPVS
- For now only a mention in #2711

Possible implementation

Use parts of the client side load balancing code to create a Talos node side load balancer for worker and control plane nodes.
Bind the load balancer on every node on localhost and set the kubernetes API server to 127.0.0.1:someport
Optionally expose the load balancer port to the public on the control plane nodes as well?

Possible usage

Instead of running

talosctl gen config <cluster name> <cluster endpoint>

One would specify three <cluster endpoints>. Instead of running

talosctl gen config <cluster name> <endpoint1>,<endpoint2>,<endpoint3>

This would result in the actual cluster.controlPlane.endpoint to be set to https://127.0.0.1:someport with a native local load balancer behind it balancing all requests to all three actual endpoints. All three endpoints would of course still be DNS names so that no worker config would need to be changed if a control plane ever changes its IP.

Ecosystem

Not accidentally more people have run into this speedbump setting up an on premise cluster. In fact, there is a large thread for just this: don't require a load balancer between cluster and control plane and still be HA and some partial fixes: KEP-3037: client-go alternative services. However, as there are many moving parts it will probably take a long time to get support for multiple API endpoints natively. Furthermore it will taken even more years before all components of all CNI's have support for this. Hence it would be better to build it into Talos now and, as a feature of Talos, remove the load balancer requirement.

As an example solution there is Rancer's RKE implementation. Their solution by default:

Points the kubelet to 127.0.0.1:6443.
Points kube-proxy to 127.0.0.1:6443.
Runs a (assuming static pod) nginx-proxy container on port 6443.
Ties in the NGINX target health checks into their ecosystem.

Some related thoughts

Talos node native load balancing could use the API target health checks to inform the boot process of API availability possibly reducing the many "try but fail will retry" errors on boot to a single "waiting for network connectivity to an API server" message.
Talos could integrate the API health checks from multiple viewpoints into the existing health checks and logging framework and report better on transient API server failures. #4088
Talos could intelligently load spread request. Say you have three sites with each a single API server, it could slightly prefer to send API request to the local (lower detected latency in load balancer logic) API server.
Maybe it's better to keep the API server variable set to a fqdn but add that fqdn to the hosts file to resolve it to localhost in case that variable gets exposed to some external services? See the workaround solution.
Maybe we could choose not to expose the kubernetes API by default but have the option of letting talosctl proxy that port on demand. Much like we do with kubectl when we need to connect to a port on a pod. It would reduce the attack service to just the Talos API at rest.
This approach might also solve any caching issues? #4470
Combining this approach with kubespan one could see binding the actual API server to only localhost and the kubespan adapter. Then relegating external API access to the port exposed by the load balancer and any possible firewall rules on that. #4421 #1417 #4898
A load balancer might provide a level of stability and flexibility removing the need for a VIP or BGP solution and their inherent complexities. #4604 #4334

Current workaround

For anybody who wants a similar solution right now you can:

Click to expand!

Instructions

Determine your API server fqdn but set a different port than 6443. For example https://kube.mycluster.mydomain.com:6444.
Add this fqdn to the host file.
```
machine:
  network:
    extraHostEntries:
    - ip: 127.0.0.1
      aliases:
      - kube.mycluster.mydomain.com
```
This way it always resolves to localhost on the worker and control plane nodes but not on any external services that have somehow received that fqdn config.

Add a static HAProxy pod to run a mini load balancer on every node.

machine:
  files:
  - path: /etc/kubernetes/manifests/kubernetes-api-haproxy.yaml
    permissions: 0o666
    op: create
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kubernetes-api-haproxy
        namespace: kube-system
      spec:
        hostNetwork: true
        containers:
          - name: kubernetes-api-haproxy
            image: haproxy
            livenessProbe:
              httpGet:
                host: localhost
                path: /livez
                port: 6445
                scheme: HTTP
            volumeMounts:
            - name: kubernetes-api-haproxy-config
              mountPath: /usr/local/etc/haproxy/haproxy.cfg
              readOnly: true
        volumes:
        - name: kubernetes-api-haproxy-config
          hostPath:
            path: /etc/kubernetes/manifests/haproxy.cfg
            type: File
  - path: /etc/kubernetes/manifests/haproxy.cfg
    permissions: 0o666
    op: create
    content: |
      global
          log stdout format raw daemon

      defaults
          log global
          option tcplog

          option http-keep-alive
          timeout connect 3s
          timeout client 1h
          timeout server 1h
          timeout tunnel 1h
          timeout client-fin 1m
          timeout server-fin 1m
          retries 1

          email-alert mailers mailservers
          email-alert from [email protected]
          email-alert to [email protected]
          email-alert level notice

      mailers mailservers
          mailer yourdomaintld your.mailserver.com:25

      frontend kube
          mode tcp
          bind :6444
          default_backend kubes

      backend kubes
          mode tcp
          balance roundrobin
          option httpchk GET /readyz
          http-check expect status 200
          default-server verify none check check-ssl inter 2s fall 2 rise 2
          server kube1 kube1.mycluster.mydomain.com:6443
          server kube2 kube1.mycluster.mydomain.com:6443
          server kube3 kube1.mycluster.mydomain.com:6443

      frontend stats
          mode http
          bind :6445
          monitor-uri /livez
          default_backend stats

      backend stats
          mode http
          stats refresh 5s
          stats show-node
          stats show-legends
          stats show-modules
          stats hide-version
          stats uri /

Note that:

As of v0.15/v1.0 there is now a better way to add a static pod to the Talos config.
The haporxy.cfg file is placed in the wrong directory resulting in an (otherwise harmless) error message on boot.
You could define a different email alert sender per node or not.

Check on every node's 6445 port for its view on API availability.
Do not forget to set the A records of the fqdn for external visitors to all three control plane nodes (or a load balancer running on kubernetes if you need HA from an external client as well).

Mar 15 '22 14:03 SixFive7

by the way Talos runs Kubernetes control plane components pointed to the localhost:6443 for the API server endpoint, so they don't require load-balancer to be up

Mar 15 '22 15:03 smira

I'm interested in this as well. Mostly for HA control plane without an external LB.

May 14 '22 16:05 edude03

I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the Kubernetes API Server In-Cluster Load Balancer section. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included. Good to see Talos OS growing. Exiting times!

Jul 11 '23 13:07 SixFive7

I just noticed the https://github.com/siderolabs/talos/releases/tag/v1.5.0-alpha.1 patch notes. Some exiting news in the Kubernetes API Server In-Cluster Load Balancer section. Looking forward to seeing how complete this build in load balancer is going to be. Especially curious if it can also function as a load balancer for external clients and if monitoring can be included. Good to see Talos OS growing. Exiting times!

This feature is in-cluster exclusively, it makes sure the cluster can run even if the external load-balancer is down (or it might prefer local traffic if the external load-balancer has higher latency).

Jul 11 '23 17:07 smira

talos
talos copied to clipboard

Build-in load balancer

Feature Request

Description

Current solutions

Suggested additional solution

Possible implementation

Possible usage

Ecosystem

Some related thoughts

Current workaround

Instructions

talos talos copied to clipboard

Build-in load balancer

Feature Request

Description

Current solutions

Suggested additional solution

Possible implementation

Possible usage

Ecosystem

Some related thoughts

Current workaround

Instructions

talos
talos copied to clipboard