kuberay
kuberay copied to clipboard
If the head node dies, the cluster is never restored [Bug]
Search before asking
- [X] I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
After Ray cluster is created and started, go and manually kill the head node. In this case an operator, as expected, will restore the head node. The issue is that worker nodes will never connect back to the head node and as a result, you will end up with a single node (head node) cluster.
The only way to fix this problem as far as I can see is either to restart all the worker nodes after restarting the head node or alternatively restart a cluster itself.
Reproduction script
Just manually kill a head node pod
Anything else
Every time the head node pod is deleted
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!