cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

cinder-csi-plugin: Use os-volume_attachments API to poll detach

Open lyarwood opened this issue 3 years ago • 25 comments

What this PR does / why we need it:

As set out in the issue this is the correct OpenStack API to use when polling volume attachment and detachment from an instance.

Which issue this PR fixes(if applicable): fixes #1645

Special notes for reviewers:

I couldn't see any unit tests for this code, existing functional tests should hopefully already cover the use case.

Release note:

NONE

lyarwood avatar Apr 13 '22 19:04 lyarwood

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: lyarwood / name: Lee Yarwood (cdd3f0539984dd742b0e2fe14b321792c521ddf7)

Welcome @lyarwood!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Apr 13 '22 19:04 k8s-ci-robot

Hi @lyarwood. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 13 '22 19:04 k8s-ci-robot

/ok-to-test

lingxiankong avatar Apr 13 '22 20:04 lingxiankong

/test openstack-cloud-csi-cinder-e2e-test

jichenjc avatar Apr 14 '22 01:04 jichenjc

looks like devstack has flaky problem

jichenjc avatar Apr 14 '22 01:04 jichenjc

/lgtm

jichenjc avatar Apr 14 '22 01:04 jichenjc

/retest

lyarwood avatar Apr 14 '22 08:04 lyarwood

@lyarwood looks like this time the test real executed and 2 failed cases , not sure it's related but let's give another try

/test openstack-cloud-csi-cinder-e2e-test

jichenjc avatar Apr 14 '22 12:04 jichenjc

/test openstack-cloud-csi-cinder-e2e-test

jichenjc avatar Apr 14 '22 12:04 jichenjc

still some test case failed, I doubt the change trigger the error ...

jichenjc avatar Apr 15 '22 01:04 jichenjc

Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {kubelet k3s-master} FailedMount: Unable to attach or mount volumes: unmounted volumes=[volume1], unattached volumes=[volume1 kube-api-access-swxsd]: volume volume1 has volumeMode Block, but is specified in volumeMounts
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] failed to get device path of attached volume : can not get device path of volume pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3, its status is reserved 
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:04 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 329c9594-e71b-4de5-ac8b-2a4859886356 volume to c54ae2b9-1de2-451c-b9b3-3a8925281d3c compute: Bad request with: [POST http://10.0.2.15/compute/v2.1/servers/c54ae2b9-1de2-451c-b9b3-3a8925281d3c/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid volume: volume 329c9594-e71b-4de5-ac8b-2a4859886356 is already attached to instances: c54ae2b9-1de2-451c-b9b3-3a8925281d3c"}}

looks like this is the root cause , but not sure why it's happening how about add some logs in the added code and see how the logic/detach is decided ?

jichenjc avatar Apr 15 '22 01:04 jichenjc

New changes are detected. LGTM label has been removed.

k8s-ci-robot avatar Apr 20 '22 13:04 k8s-ci-robot

Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {kubelet k3s-master} FailedMount: Unable to attach or mount volumes: unmounted volumes=[volume1], unattached volumes=[volume1 kube-api-access-swxsd]: volume volume1 has volumeMode Block, but is specified in volumeMounts
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:03 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] failed to get device path of attached volume : can not get device path of volume pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3, its status is reserved 
Apr 14 14:12:07.273: INFO: At 2022-04-14 14:07:04 +0000 UTC - event for pod-b24d99fc-971c-4e94-8df6-e8b27b7953a9: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-7205016b-7c10-4c62-a438-dd4bb2b33eb3" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 329c9594-e71b-4de5-ac8b-2a4859886356 volume to c54ae2b9-1de2-451c-b9b3-3a8925281d3c compute: Bad request with: [POST http://10.0.2.15/compute/v2.1/servers/c54ae2b9-1de2-451c-b9b3-3a8925281d3c/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid volume: volume 329c9594-e71b-4de5-ac8b-2a4859886356 is already attached to instances: c54ae2b9-1de2-451c-b9b3-3a8925281d3c"}}

looks like this is the root cause , but not sure why it's happening how about add some logs in the added code and see how the logic/detach is decided ?

Yeah I think the issue was that when hitting an error my previous change would return false assuming that the upper layers would handle the error. They don't so I've switched it back to true now so the retry logic should at least kick in on detach.

lyarwood avatar Apr 20 '22 13:04 lyarwood

/retest

lyarwood avatar Apr 20 '22 17:04 lyarwood

/test openstack-cloud-csi-cinder-e2e-test

jichenjc avatar Apr 21 '22 00:04 jichenjc

@jichenjc any idea where I could find the broken down logs or do we only get a single test log with prow?

lyarwood avatar Apr 21 '22 09:04 lyarwood

@lyarwood we were supposed to have logs if run to error or success

https://github.com/kubernetes/cloud-provider-openstack/blob/master/tests/playbooks/roles/install-csi-cinder/tasks/main.yaml#L151

in your case I don't know why it stuck for 3 hours then quit without any info ,need check more about it ..

jichenjc avatar Apr 21 '22 10:04 jichenjc

/test openstack-cloud-csi-cinder-e2e-test

jichenjc avatar Apr 21 '22 10:04 jichenjc

/test openstack-cloud-csi-cinder-e2e-test

ramineni avatar May 02 '22 09:05 ramineni

/retest

chrigl avatar Jul 05 '22 11:07 chrigl

I tested this code but does not work for me. If I try to unattach a volume after this version of diskIsAttached returns true I get this error: Invalid volume: Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach

drencrom avatar Aug 16 '22 16:08 drencrom

I tested this code but does not work for me. If I try to unattach a volume after this version of diskIsAttached returns true I get this error: Invalid volume: Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach

not sure this code is still under work now.. maybe you can provide your fix in case you can? Thanks @drencrom

jichenjc avatar Aug 17 '22 00:08 jichenjc

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from jichenjc by writing /assign @jichenjc in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 31 '22 16:08 k8s-ci-robot

@lyarwood: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
openstack-cloud-csi-manila-e2e-test 748f6c1da65b98dc55d3fb439cc48cf8b9970dbe link true /test openstack-cloud-csi-manila-e2e-test
openstack-cloud-csi-cinder-e2e-test 742c3d8545f82e8a0eaf861aa5dc2976e3b2509a link true /test openstack-cloud-csi-cinder-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Aug 31 '22 18:08 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 29 '22 19:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 29 '22 19:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Jan 28 '23 20:01 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 28 '23 20:01 k8s-ci-robot