cluster-api-provider-openstack
cluster-api-provider-openstack copied to clipboard
🐛 openstackmachine: do not set transient error message and reason
What this PR does / why we need it:
With this PR we set failureReason & failureMessage of the OpenStackMachine for the following three cases:
- Instance not found
- Instance in ERROR state
- as pointed out in https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1116#issuecomment-1029792653, unsure if we should really stop reconciliation. However, keeping this to remain with the same behavior.
- json marshalling or unmarshalling error in the network status
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1116
Special notes for your reviewer:
- Introduced separate functions for setting the
failureReason&failureMessageto have a choice when setting thefailureReasoncompared to just setting UpdateMachineError for everything. - openstackcluster will pe part of a separate PR
- Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
- [x] squashed commits
- if necessary:
- [ ] includes documentation
- [ ] adds unit tests
/hold Sean Schneeweiss [email protected], Mercedes-Benz Tech Innovation GmbH, Provider Information
Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
| Name | Link |
|---|---|
| Latest commit | b9689c61933b5bd5947f6518a1ef345d75645ce9 |
| Latest deploy log | https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/636a0f0ad771bd000923fea3 |
| Deploy Preview | https://deploy-preview-1301--kubernetes-sigs-cluster-api-openstack.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site settings.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: seanschneeweiss
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [seanschneeweiss]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
WDYT about catching authentication errors for all cases and reconcile them again? @bavarianbidi
Authentication failed relates to error code 401.
https://github.com/gophercloud/gophercloud/blob/0226ea5963656bcd435ab17af475e2345e523fcb/errors.go#L181-L183
It seems that only error codes >= 500 are considered retryable errors (as of now). https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/505b5d27c834e5929e5f076cdfe47ff6b97de381/pkg/utils/errors/errors.go#L26-L33
I'd say that IsRetryable should be extended to also include some of the errors from https://github.com/gophercloud/gophercloud/blob/master/errors.go.
E.g.
- 408: The server timed out waiting for the request
- 429: Too many requests have been sent in a given amount of time. Pause requests, wait up to one minute, and try again.
- and potentially 401: Authentication failed (as per your recommendation).
I think it is better to make the OpenStack calls via gophercloud retryable then filtering when setting the failureMessage/failureReason.
I'm no expert about OpenStack error codes and how gophercloud handles them - hopefully someone else can support me in this.
I'd say that
IsRetryableshould be extended to also include some of the errors from gophercloud/gophercloud@master/errors.go.E.g.
- 408: The server timed out waiting for the request
- 429: Too many requests have been sent in a given amount of time. Pause requests, wait up to one minute, and try again.
- and potentially 401: Authentication failed (as per your recommendation).
I think it is better to make the OpenStack calls via gophercloud retryable then filtering when setting the failureMessage/failureReason.
I'm no expert about OpenStack error codes and how gophercloud handles them - hopefully someone else can support me in this.
sorry @mdbooth / @jichenjc for direct mentioning.
I'm fine with the proposed solution from @seanschneeweiss regarding extending IsRetryable. But i'm not sure if we oversee something. WDYT?
(FYI: this issue is currently the only one which is marked for milestone 0.6.4)
@bavarianbidi Let's discuss the modification of IsRetryable in a separate issue/PR. Would you agree?
/lgtm /hold
The /hold is in case you want to address the review comments. Feel free to remove it.
I just want to say thank you for working on this @seanschneeweiss and reviewers! We are really looking forward to this feature.
@mdbooth thank you for the great review. All suggestions were added. Would be great if you could check again.
@bavarianbidi Let's discuss the modification of
IsRetryablein a separate issue/PR. Would you agree?
@seanschneeweiss sure, i'm totally fine with that.
/hold cancel
/retest
Edit: I think there is an issue with Prow currently

[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: lentzi90, mdbooth, seanschneeweiss
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [mdbooth,seanschneeweiss]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest
/retest
/lgtm
I think we should cherry pick this since we have it in the v0.6.4 milestone. /cherry-pick release-0.6
@lentzi90: #1301 failed to apply on top of branch "release-0.6":
Applying: openstackmachine: do not set transient error message and reason
Using index info to reconstruct a base tree...
A api/v1alpha6/openstackmachine_types.go
M controllers/openstackmachine_controller.go
M controllers/openstackmachine_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging controllers/openstackmachine_controller_test.go
Auto-merging controllers/openstackmachine_controller.go
CONFLICT (content): Merge conflict in controllers/openstackmachine_controller.go
Auto-merging api/v1alpha5/openstackmachine_types.go
CONFLICT (content): Merge conflict in api/v1alpha5/openstackmachine_types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 openstackmachine: do not set transient error message and reason
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
In response to this:
I think we should cherry pick this since we have it in the v0.6.4 milestone. /cherry-pick release-0.6
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Oh that was not so straight forward