aws-ebs-csi-driver icon indicating copy to clipboard operation
aws-ebs-csi-driver copied to clipboard

Node can't access volumeattachments resource

Open shanesiebken opened this issue 6 years ago • 19 comments

/kind bug

What happened? When a pod with a persistent volume is deleted, the new pod fails to attach / mount the storage with the following error:

MountVolume.WaitForAttach failed for volume "<pvc_name>" : volume <volume_name> has GET error
 for volume attachment csi-4b2c0c56...: volumeattachments.storage.k8s.io "csi-4b2c0c56..." is forbidden: User "system:node:<node_name>" cannot get resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: no relationship found between node "<node_name>" and this object

What you expected to happen? Volume should move to new pod and successfully mount.

How to reproduce it (as minimally and precisely as possible)?

  1. Create a deployment with a pod that mounts a PVC provisioned by AWS EBS CSI Driver
  2. Delete the pod
  3. Describe the new pod and see the message specified above. It is usually the next message after "Multi-attach failure", which is an expected message while the original pod is being deleted.

Anything else we need to know?:

  • This error has been intermittent, and seen with both new volumes and "migrated" ones.
  • This error occurs in a cluster with the aws cloud provider running out-of-tree (which doesn't include volume provisioning logic)

Environment

  • Kubernetes version (use kubectl version): 1.15.1
  • Driver version: commit 2aed4b5

shanesiebken avatar Aug 08 '19 17:08 shanesiebken

What's the spec.nodename of the volumeattachment object "csi-4b2c0c56..." and does it match the actual "<node_name>"? Authorization to get volumeattachments is handled by the node authorizer https://github.com/kubernetes/kubernetes/pull/58360/.

wongma7 avatar Aug 09 '19 16:08 wongma7

In this case, the volumeattachment object doesn't exist. From the digging I did into the node authorizer, it looks like it's returning a "not authorized" if the attachment isn't found.

shanesiebken avatar Aug 09 '19 23:08 shanesiebken

kubernetes is the one that creates/deletes volumeattachments so there might be something in the logs if you turn the controller-manager logging verbosity up to 4 like "detacher deleted ok VolumeAttachment.ID" https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_attacher.go#L448

wongma7 avatar Aug 09 '19 23:08 wongma7

Oh, that is good to know. I will take a look at that if I see this error again. Thanks!

shanesiebken avatar Aug 15 '19 15:08 shanesiebken

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Nov 17 '19 06:11 fejta-bot

/remove-lifecycle stale

leakingtapan avatar Nov 19 '19 00:11 leakingtapan

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Apr 14 '20 05:04 fejta-bot

/remove-lifecycle stale

leakingtapan avatar May 09 '20 17:05 leakingtapan

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Aug 07 '20 17:08 fejta-bot

I'm seeing this pretty often now.

ArseniiPetrovich avatar Aug 20 '20 11:08 ArseniiPetrovich

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot avatar Sep 19 '20 11:09 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

fejta-bot avatar Oct 19 '20 12:10 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 19 '20 12:10 k8s-ci-robot

Was there any solution this ever? We ran into this just today

christianhuening avatar Dec 01 '21 13:12 christianhuening

I'm seeing this pretty often now.

vlerenc avatar Mar 28 '22 09:03 vlerenc

also with latest csi ebs drivers? we just updated.

christianhuening avatar Mar 28 '22 09:03 christianhuening

Is there any updates? we also see same problem

fzyzcjy avatar Jul 01 '22 22:07 fzyzcjy

Hey @christianhuening @fzyzcjy @vlerenc, can you provide environment details and specify the driver version being used?

As previously mentioned, authorization to get volumeattachments is handled by the node authorizer so we'll also need to take a look at the controller logs. The volumeattachment object should be present and the spec.nodename of the volumeattachment object should be equal to "<node_name>" in the error message logged.

torredil avatar Jul 05 '22 15:07 torredil

Is there a workaround for this? Deleting the affected pod didn't help. The volumeattachment object doesn't exist in my case.

danports avatar Sep 15 '22 00:09 danports

also with latest csi ebs drivers? we just updated.

we are hitting this as well...

@christianhuening - did upgrading fix the issue for you? We are currently running chart v2.6.8 was thinking of upgrading to 2.10.1 to catch the upgrades in 2.6.9 etc but avoid having to deal with the CSIDriver shuffle introduced in 2.11.0 right now.

brian-provenzano avatar Oct 19 '22 17:10 brian-provenzano

We didn't see the issue since, so I'd say yes.

christianhuening avatar Oct 19 '22 17:10 christianhuening

This does appear to fix the issue for us (so far). We upgraded to chart version 2.12.1 and app version 1.12.1.

brian-provenzano avatar Oct 24 '22 21:10 brian-provenzano

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '23 21:01 k8s-triage-robot

/close

This issue has been resolved. If you run into this please report it and specify K8s / driver versions, thanks.

torredil avatar Feb 01 '23 19:02 torredil

I'm running into this issue with v1.10 of the driver. From reading the messages above it seems like upgrading will likely fix the issue. I don't see anything in the changelog that's obviously related to this issue though. Can anyone point me to the relevant change?

davidmauskop avatar Feb 03 '23 02:02 davidmauskop