🌱 clusterctl: add flag to skip lagging provider check in ApplyCustomPlan
What this PR does / why we need it:
Clusterctl runs a pre-check to see if any other providers are lagging behind the target contract before creating an upgrade plan. In the current implementation of cluster-api-operator, there are multiple controllers reconciling on each different provider type. Each one of these controllers doesn't have knowledge of the other providers, and doesn't pass in enough information to clusterctl to be able to complete this check successfully. This PR is adds a flag and UpgradeOption to allow us to skip this pre-check and successfully upgrade the provider.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
This fixes issue 570 in the cluster-api-operator repo.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Welcome @w21froster!
It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @w21froster. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/area clusterctl
@JoelSpeed @Jont828 Please take a look when you are available 🙏
@JoelSpeed @Jont828 Are you able to take a look? Let me know if you need more context on anything.
Question: Could adding this and using it in the cluster-api-operator lead to issues?
Could it be possible to have providers then running in different contract versions which could maybe lead to issues?
Upgrading using clusterctl upgrades all providers at the same time instead of each one in parallel (so some could still be running while others are already upgraded).
Question: Could adding this and using it in the cluster-api-operator lead to issues?
Could it be possible to have providers then running in different contract versions which could maybe lead to issues?
Upgrading using clusterctl upgrades all providers at the same time instead of each one in parallel (so some could still be running while others are already upgraded).
I don't think this should be an issue, we talked about it in the cluster-api-operator office hours and determined that this was probably the best way forward to add a flag in clusterctl to skip this check. We have different CR's for each provider, and when users upgrade their providers they typically move all versions at the same time. I guess there could potentially be a delay between reconciliation for each provider, but we haven't noticed any issues running this as a fork and upgrading Azure CAPI/CAPBK/KCP providers.
Definitely open to better approaches though! I can stop by the CAPI office hours to discuss this issue we are having in more detail.
I personally have some concern on disabling this check, considering that the value added of clusterctl is to ensure the health of the management cluster as whole.
TBH, I think that if someone asks to the operator to upgrade a single provider, this operation must be put on hold if it can lead to an invalid cluster (leaning on "when users upgrade their providers they typically move all versions at the same time" seems weak).
The upgrade operation for the providers involved should unblock itself when the users is upgrading enough providers to reach a valid state.
The issues seems to be in "Each one of these controllers doesn't have knowledge of the other providers, and doesn't pass in enough information to clusterctl to be able to complete this check successfully", but I think there are ways to get around since AFAIK for each provider there is a CR with a desired state/target version
Hey @fabriziopandini, sorry for the delayed response. Thank you for providing more context on this check. We don't want users to be able to break their cluster if they have a misconfiguration, so I think a PR should be made in the CAPI operator instead of CAPI to get this to pass. I will go ahead and close this PR