cloud-provider-openstack
cloud-provider-openstack copied to clipboard
cinder-csi-plugin: Use os-volume_attachments API to poll detach
What this PR does / why we need it:
As set out in the issue this is the correct OpenStack API to use when polling volume attachment and detachment from an instance.
Which issue this PR fixes(if applicable): fixes #1645
Special notes for reviewers:
I couldn't see any unit tests for this code, existing functional tests should hopefully already cover the use case.
Release note:
NONE
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: lyarwood / name: Lee Yarwood (cdd3f0539984dd742b0e2fe14b321792c521ddf7)
Welcome @lyarwood!
It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @lyarwood. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
/test openstack-cloud-csi-cinder-e2e-test
looks like devstack has flaky problem
/lgtm
/retest
@lyarwood looks like this time the test real executed and 2 failed cases , not sure it's related but let's give another try
/test openstack-cloud-csi-cinder-e2e-test
/test openstack-cloud-csi-cinder-e2e-test
still some test case failed, I doubt the change trigger the error ...
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {kubelet k3s-master} FailedMount: Unable to attach or mount volumes: unmounted volumes=[volume1], unattached volumes=[volume1 kube-api-access-swxsd]: volume volume1 has volumeMode Block, but is specified in volumeMounts
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] failed to get device path of attached volume : can not get device path of volume pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3, its status is reserved
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:04 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 329c9594-e71b-4de5-ac8b-2a4859886356 volume to c54ae2b9-1de2-451c-b9b3-3a8925281d3c compute: Bad request with: [POST http://10.0.2.15/compute/v2.1/servers/c54ae2b9-1de2-451c-b9b3-3a8925281d3c/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid volume: volume 329c9594-e71b-4de5-ac8b-2a4859886356 is already attached to instances: c54ae2b9-1de2-451c-b9b3-3a8925281d3c"}}
looks like this is the root cause , but not sure why it's happening how about add some logs in the added code and see how the logic/detach is decided ?
New changes are detected. LGTM label has been removed.
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {kubelet k3s-master} FailedMount: Unable to attach or mount volumes: unmounted volumes=[volume1], unattached volumes=[volume1 kube-api-access-swxsd]: volume volume1 has volumeMode Block, but is specified in volumeMounts Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] failed to get device path of attached volume : can not get device path of volume pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3, its status is reserved Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:04 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 329c9594-e71b-4de5-ac8b-2a4859886356 volume to c54ae2b9-1de2-451c-b9b3-3a8925281d3c compute: Bad request with: [POST http://10.0.2.15/compute/v2.1/servers/c54ae2b9-1de2-451c-b9b3-3a8925281d3c/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid volume: volume 329c9594-e71b-4de5-ac8b-2a4859886356 is already attached to instances: c54ae2b9-1de2-451c-b9b3-3a8925281d3c"}}looks like this is the root cause , but not sure why it's happening how about add some logs in the added code and see how the logic/detach is decided ?
Yeah I think the issue was that when hitting an error my previous change would return false assuming that the upper layers would handle the error. They don't so I've switched it back to true now so the retry logic should at least kick in on detach.
/retest
/test openstack-cloud-csi-cinder-e2e-test
@jichenjc any idea where I could find the broken down logs or do we only get a single test log with prow?
@lyarwood we were supposed to have logs if run to error or success
https://github.com/kubernetes/cloud-provider-openstack/blob/master/tests/playbooks/roles/install-csi-cinder/tasks/main.yaml#L151
in your case I don't know why it stuck for 3 hours then quit without any info ,need check more about it ..
/test openstack-cloud-csi-cinder-e2e-test
/test openstack-cloud-csi-cinder-e2e-test
/retest
I tested this code but does not work for me. If I try to unattach a volume after this version of diskIsAttached returns true I get this error:
Invalid volume: Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach
I tested this code but does not work for me. If I try to unattach a volume after this version of diskIsAttached returns true I get this error:
Invalid volume: Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach
not sure this code is still under work now.. maybe you can provide your fix in case you can? Thanks @drencrom
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jichenjc by writing /assign @jichenjc in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@lyarwood: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| openstack-cloud-csi-manila-e2e-test | 748f6c1da65b98dc55d3fb439cc48cf8b9970dbe | link | true | /test openstack-cloud-csi-manila-e2e-test |
| openstack-cloud-csi-cinder-e2e-test | 742c3d8545f82e8a0eaf861aa5dc2976e3b2509a | link | true | /test openstack-cloud-csi-cinder-e2e-test |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.