yawol icon indicating copy to clipboard operation
yawol copied to clipboard

Meltdown protection if all nodes are not ready

Open maboehm opened this issue 6 months ago • 0 comments

Avoid setting the LoadBalancers endpoints to an empty list if all nodes are reported as non-healthy.

This can help mitigating a scenario, where the kubelets cannot reach the control plane to update their heartbeat, but the yawol components still can. This can happen if yawol manages loadbalancers of a cluster it is not deployed in.

In this scenario, the kube-controller-manager will quickly update all the nodes as NotReady, which in turn will trigger the yawol-cloud-controller to remove all lb.Spec.Endpoints, which when picked up by the yawolets, results in no traffic reaching the cluster.

Instead, I propose rather keeping a list of (non-terminating) nodes, even if they are considered as not ready. That way the workload might still be available. This "meltdown protection" will only kick in, if ALL nodes are unhealthy, so should have no effect on most LBs most of the time.

maboehm avatar Aug 12 '24 16:08 maboehm