Setting `md.spec.paused` to `true` should fully pause the MachineDeployment
What steps did you take and what happened?
- Create a legacy cluster with 1 MD
- Set
.spec.pausedof the MD totrue - Edit the MD to change
md.spec.replicas
The MachineDeployment scales and creates new Machines.
What did you expect to happen?
Pausing the MachineDeployment should fully pause its reconciliation.
Looking at the MachineDeployment reconciler code the following is the piece of code responsible for this behavior. We still sync the MD even if the MD is paused.
/internal/controllers/machinedeployment/machinedeployment_controller.go#L266-L268
if md.Spec.Paused {
return ctrl.Result{}, r.sync(ctx, md, msList)
}
Additional notes:
- Setting
md.spec.pausedtotruewill not create a new MS but will still reconcile changes that do no need a rollout. - Setting the
cluster.x-k8s.io/pausedannotation will pause the MD completely.
It is better to have consistent behavior and and fully pause MD reconciliation if md.spec.paused is set to true.
Cluster API version
main
Kubernetes version
No response
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
I wonder if we should deprecate the field instead.
It's not clear to me why we need two ways to achieve the same.
Do we have a guideline which objects need a paused field in addition to considering the annotation? (most CRDs don't have a paused field. MD has one, MachinePool doesn't)
Do we have a guideline which objects need a paused field in addition to considering the annotation? (most CRDs don't have a paused field. MD has one, MachinePool doesn't)
Not sure what the guidelines are. Only Cluster and MachineDeployment seem to have the .spec.paused field, at least in Core CAPI. For MD the only other place, besides the controller, the .spec.paused field is used is in clusterctl alpha rollout pause/resume/restart.
If we deprecate this field then we should ideally change the logic in these command to use the cluster.x-k8s.io/paused annotation instead. However that can break compatibility with old MDs. Consider the case where a user paused the MD using and old clusterctl using clusterctl alpha rollout pause and upgraded to the new clusterctl version. They will not be able to unpause it using clusterctl alpha rollout resume.
I think we can implement resume in a way that it tries both. It's also an alpha command
Shall I close this and open a new issue to mark the field as deprecated?
Let's wait to get a few more opinions. Maybe also bring it up at the office hours.
(@fabriziopandini WDYT?)
It seems to me that this behavior dates back to when we started "cloning" the K8s Deployment code. might be @enxebre or @vincepri have more context. +1 to bring this up during the office hours
I will add it to this week's agenda.
/triage accepted
discussed in May 17th office hours. it dates back from we cloned from K8s Deployment code. there is a general agreement on deprecating, this should follow policy (wait for the next API version). It could help to ask for feedback on the mailing list.
@ykakarap What is your plan regarding "It could help to ask for feedback on the mailing list."?
It's not clear to me why we would prefer to use an annotation (which is unversionable and has no schema) over a field that can be declared, versioned, checked using a concrete schema within the resource?
Reviewing this, I'm not sure why we wouldn't just fix the behaviour of the pause field? Yes it might be considered breaking to stop reconciling the replicas, but we could document it as a change in a v1betaX transition of the API
This seems to first be introduced here https://github.com/kubernetes-sigs/cluster-api/pull/143 as purely a carry over from core kube Deployments https://github.com/kubernetes/kubernetes/blob/af99df6a6bead927d61746dc431fc1b526af55a0/pkg/controller/deployment/deployment_controller.go#L645-L647
I'm with @JoelSpeed If we agree on pausing being a wanted feature, I think we should rather revisit the current field behaviour, consider it a but and fix forward by keeping it as an API driven feature.
I created this a while ago to capture that https://github.com/kubernetes-sigs/cluster-api/issues/6966
cc @mjlshen you might be interested in picking https://github.com/kubernetes-sigs/cluster-api/issues/6966 to implement what ever is the outcome of this discussion
It's not clear to me why we would prefer to use an annotation (which is unversionable and has no schema) over a field that can be declared, versioned, checked using a concrete schema within the resource?
I agree with @JoelSpeed comments, but I don't have full context on why the current implementation has been implemented with annotation. However, I would like to bring also some practical considerations to the discussion as well:
Using an annotation is the current state for all the objects except the cluster (by design) and the MD (which seems a leftover); might not be ideal but at least it is consistent + supported by utils like e.g. predicates
From what I got from the discussion during office hours this issue was about dropping a MD feature that doesn't seem to have a specific use; it seems also to me that this feature is orthogonal to the pause annotation, or related to it more by a coincidence than by a design decision.
Improving the move pausing API definitely is a bigger goal: we should plan a course to make this change across the board for all the CAPI objects, possibly also provider-specific objects, plan for deprecating the current annotation etc.
So IMO given the two issues on the table, "drop the unused feature in MD" and "pausing API improvement", we should decide if to keep them separated and try to get the first one moving, or if to link the cleanup with the bigger API change.
/priority important-longterm
/assign To re-assess and possibly merge/dedup with https://github.com/kubernetes-sigs/cluster-api/issues/6966
Maybe a good first step would be just to make sure that annotation and field behave exactly the same. Then we can still consider deprecating the field if that's what we want to do
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/triage accepted
/help
@chrischdi: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.