aws-ebs-csi-driver Node can't access volumeattachments resource

/kind bug

What happened? When a pod with a persistent volume is deleted, the new pod fails to attach / mount the storage with the following error:

MountVolume.WaitForAttach failed for volume "<pvc_name>" : volume <volume_name> has GET error
 for volume attachment csi-4b2c0c56...: volumeattachments.storage.k8s.io "csi-4b2c0c56..." is forbidden: User "system:node:<node_name>" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node "<node_name>" and this object

What you expected to happen? Volume should move to new pod and successfully mount.

How to reproduce it (as minimally and precisely as possible)?

Create a deployment with a pod that mounts a PVC provisioned by AWS EBS CSI Driver
Delete the pod
Describe the new pod and see the message specified above. It is usually the next message after "Multi-attach failure", which is an expected message while the original pod is being deleted.

Anything else we need to know?:

This error has been intermittent, and seen with both new volumes and "migrated" ones.
This error occurs in a cluster with the aws cloud provider running out-of-tree (which doesn't include volume provisioning logic)

Environment

Kubernetes version (use kubectl version): 1.15.1
Driver version: commit 2aed4b5

Aug 08 '19 17:08 shanesiebken

What's the spec.nodename of the volumeattachment object "csi-4b2c0c56..." and does it match the actual "<node_name>"? Authorization to get volumeattachments is handled by the node authorizer https://github.com/kubernetes/kubernetes/pull/58360/.

Aug 09 '19 16:08 wongma7

In this case, the volumeattachment object doesn't exist. From the digging I did into the node authorizer, it looks like it's returning a "not authorized" if the attachment isn't found.

Aug 09 '19 23:08 shanesiebken

kubernetes is the one that creates/deletes volumeattachments so there might be something in the logs if you turn the controller-manager logging verbosity up to 4 like "detacher deleted ok VolumeAttachment.ID" https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_attacher.go#L448

Aug 09 '19 23:08 wongma7

Oh, that is good to know. I will take a look at that if I see this error again. Thanks!

Aug 15 '19 15:08 shanesiebken

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Nov 17 '19 06:11 fejta-bot

/remove-lifecycle stale

Nov 19 '19 00:11 leakingtapan

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Apr 14 '20 05:04 fejta-bot

/remove-lifecycle stale

May 09 '20 17:05 leakingtapan

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Aug 07 '20 17:08 fejta-bot

I'm seeing this pretty often now.

Aug 20 '20 11:08 ArseniiPetrovich

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Sep 19 '20 11:09 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Oct 19 '20 12:10 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 19 '20 12:10 k8s-ci-robot

Was there any solution this ever? We ran into this just today

Dec 01 '21 13:12 christianhuening

I'm seeing this pretty often now.

Mar 28 '22 09:03 vlerenc

also with latest csi ebs drivers? we just updated.

Mar 28 '22 09:03 christianhuening

Is there any updates? we also see same problem

Jul 01 '22 22:07 fzyzcjy

Hey @christianhuening @fzyzcjy @vlerenc, can you provide environment details and specify the driver version being used?

As previously mentioned, authorization to get volumeattachments is handled by the node authorizer so we'll also need to take a look at the controller logs. The volumeattachment object should be present and the spec.nodename of the volumeattachment object should be equal to "<node_name>" in the error message logged.

Jul 05 '22 15:07 torredil

Is there a workaround for this? Deleting the affected pod didn't help. The volumeattachment object doesn't exist in my case.

Sep 15 '22 00:09 danports

also with latest csi ebs drivers? we just updated.

we are hitting this as well...

@christianhuening - did upgrading fix the issue for you? We are currently running chart v2.6.8 was thinking of upgrading to 2.10.1 to catch the upgrades in 2.6.9 etc but avoid having to deal with the CSIDriver shuffle introduced in 2.11.0 right now.

Oct 19 '22 17:10 brian-provenzano

We didn't see the issue since, so I'd say yes.

Oct 19 '22 17:10 christianhuening

This does appear to fix the issue for us (so far). We upgraded to chart version 2.12.1 and app version 1.12.1.

Oct 24 '22 21:10 brian-provenzano

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 22 '23 21:01 k8s-triage-robot

/close

This issue has been resolved. If you run into this please report it and specify K8s / driver versions, thanks.

Feb 01 '23 19:02 torredil

I'm running into this issue with v1.10 of the driver. From reading the messages above it seems like upgrading will likely fix the issue. I don't see anything in the changelog that's obviously related to this issue though. Can anyone point me to the relevant change?

Feb 03 '23 02:02 davidmauskop

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Node can't access volumeattachments resource

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard