kuberay
kuberay copied to clipboard
Kuberay operator Deployment strategy type should be Recreate
Search before asking
- [X] I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
We should change the Kuberay operator
deployment's strategy.type
from the default (RollingUpdate
) to Recreate
to avoid potential problems when a new operator pod is deployed during upgrade.
Only one operator pod is supposed to run at any given time is a better solution to avoid inconsistencies, and without leader election.
see also: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
BTW, we should also remove the replicaCount: 1
from the values.yaml
and hard code as 1
in the deployment.yaml
Reproduction script
No need
Anything else
No response
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
@akanso @Jeffwan do you have thoughts on the deployment strategy for the operator? Recreate does sound safer, possibly at the the expense of a bit more downtime.
Removing the replica count field sounds like a good idea until we make leader election possible: https://github.com/ray-project/kuberay/issues/474
Another option is to expose both strategy and replica fields, default them to Recreate and 1 respectively, and give a warning not to mess with the fields unless you know what you're doing.
I agree, since the operator affects the K8s cluster state, we do not want two active operator pods running at the same time.
Since the startup time is negligible for the operator, we can afford to have a recreate policy, or a maxSurge =0