aws-ebs-csi-driver Volume Goes from Detatching to Busy

/kind bug

What happened? Restarting a pod on one node ("A") caused the pod to move to a different node ("B"). The volume did not properly detach from node A and thus the pod will not start on node B.

Logs:

I0825 17:13:17.140132       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=detaching, desired=detached
I0825 17:13:18.252237       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=detaching, desired=detached
I0825 17:13:20.172262       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=detaching, desired=detached
I0825 17:13:23.511057       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=detaching, desired=detached
I0825 17:13:29.460476       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:32.397379       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:33.507502       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:35.412143       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:38.764186       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
W0825 17:13:39.958444       1 cloud.go:542] Ignoring error from describe volume for volume "vol-07f25ddd58f940a9f"; will retry: "RequestCanceled: request context canceled\ncaused by: context canceled"
I0825 17:13:44.692900       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:47.074830       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:48.196219       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:50.088750       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:13:53.440074       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
W0825 17:13:55.192032       1 cloud.go:542] Ignoring error from describe volume for volume "vol-07f25ddd58f940a9f"; will retry: "RequestCanceled: request context canceled\ncaused by: context canceled"
W0825 17:13:58.855252       1 cloud.go:542] Ignoring error from describe volume for volume "vol-07f25ddd58f940a9f"; will retry: "RequestCanceled: request context canceled\ncaused by: context canceled"
I0825 17:13:59.412653       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:14:02.192914       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:14:03.318235       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:14:05.223991       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached
I0825 17:14:08.576946       1 cloud.go:601] Waiting for volume "vol-07f25ddd58f940a9f" state: actual=busy, desired=detached

What you expected to happen? The volume detatches without requiring a force detatch

How to reproduce it (as minimally and precisely as possible)? Using:

k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.2.0 / aws-ebs-csi-driver-2.0.4 via helm
RHEL 7.9 on EC2
Kubernetes 1.20.2 (NOT EKS!)

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version): 1.20.2
Driver version: 1.2.0

Aug 25 '21 17:08 codyharris-h2o-ai

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 23 '21 17:11 k8s-triage-robot

/remove-lifecycle stale

Dec 21 '21 10:12 k1rk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 03 '22 13:04 k8s-triage-robot

/remove-lifecycle stale

Apr 04 '22 15:04 k1rk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 03 '22 15:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 02 '22 16:08 k8s-triage-robot

/remove-lifecycle rotten

Aug 31 '22 17:08 k1rk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 29 '22 18:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 29 '22 18:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 28 '23 19:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 28 '23 19:01 k8s-ci-robot

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Volume Goes from Detatching to Busy

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard