bundle-kubeflow Feature request : high-availability

When a kubernetes node goes down, it would be great if the kubeflow pods would evacuate and respawn on another node. At the moment, pods get stuck at "terminating" and the failover never completes. This hinders the natural HA capability of kubernetes. How to test: shut down a node hosting kubeflow pods. What currently works is if you reboot the node, then the pods eventually all come back when the node comes back online.

Mar 17 '23 18:03 camille-rodriguez

@camille-rodriguez I agree with you, although I think the problem is mainly the high availability of Juju-Controller. As with my cluster with 3 microk8s nodes, it only works on the first if it goes down there is no more Juju-Controller on the other two!!!! Thank's.

Mar 22 '23 14:03 moula

Hi @moula , agreed this is also a problem. The juju bug is reported here https://bugs.launchpad.net/juju/+bug/1849030

Mar 23 '23 12:03 camille-rodriguez

Hey @camille-rodriguez, was taking a look at this issue.

This behaviour you describe is rooted in the fact that the Charms are using a K8s StatefulSet under the hood. I'll make the assumption that when you mention "a node goes down" you mean something happened, i.e. it crashed, and not that an admin explicitly removed the Node object from the K8s API Server.

Adding some references from the K8s docs/issues

https://github.com/kubernetes/kubernetes/issues/54368#issuecomment-339378164
https://github.com/kubernetes/kubernetes/issues/74689#issuecomment-501086545
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods

So in this case what's happening is the expected behaviour, in terms of K8s. By K8s best practices, if a node is unexpectedly down, and we expect it do be like this and not come up again, then an admin should delete the Node object from the API server. Else, we expect the node to be up again and the stateful workloads to "re-appear" to K8s.

Now then there's the question; are all the Charms stateful and they need to run as a StatefulSet and not instead as a Deployment (in which case a Pod would get created elsewhere)?

I expect not, and maybe it makes sense to have an option in the Charms to be able to configure this. But I'd confirm with the Juju team

Nov 27 '23 09:11 kimwnasptd