kops
kops copied to clipboard
Add a flag to rolling update to fail immediately on IG error
Fixes #14176
Add a flag to kops rolling-update cluster
that will exit the rolling update when the rolling update first encounters an error with an instancegroup that is normally tried in serial (either APIServer
or Node
).
I have added a unit test which should fail if ExitOnFirstError
is set to false
, but please let me know if there is additional documentation or testing that I should add.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: jandersen-plaid (40caf71d9d8dee8061aa13b5ee3dc9ac47b4114b)
Welcome @jandersen-plaid!
It looks like this is your first PR to kubernetes/kops 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/kops has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @jandersen-plaid. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johngmyers for approval by writing /assign @johngmyers
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
I'd rather not add a flag for this. I think it is enough to inspect the returned error and return directly if it does not make sense to continue.
@jandersen-plaid: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I think I'd prefer this be the default or only option.
/hold /kind office-hours
I believe the history was that the update on the IG would previously wait forever, not fail.
If a control plane IG fails, we already directly return an error. That is by far the most important behavior. If an IG fails and kOps keeps going to the next, and keeps going to the next and continues to gracefully drain and terminate nodes makes sense.
But in the case of a validation error, it doesn't make sense to keep going as kOps won't succeed with the next IG either.
/ok-to-test
/retest
/retest
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: olemarkus
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [olemarkus]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@johngmyers you still want to hold this one?
/hold cancel
/retest