aws-efs-csi-driver Install of Amazon EFS driver via manifest fails to create application pods and same has error "attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers"

Install of Amazon EFS driver via manifest fails to create application pods and same has error "attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers"

Open gowthams316 opened this issue 2 years ago • 11 comments

/kind bug

What happened? When trying to install EFS CSI driver via the yaml or manifest per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver with below commands fails to create application pod and has errors.

Used the below command to install driver from EKS documentation

kubectl kustomize \
    "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr?ref=release-1.3" > driver.yaml

kubectl apply -f driver.yaml

Also, tried the below from EFS CSI driver Github page

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.3"

Pod events:

  Warning  FailedMount       2s (x6 over 18s)   kubelet            MountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

What you expected to happen?

It should have created the application pod using EFS via PV/PVC.
Also, to note that it works fine with Helm commands per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver. But fails with manifest method.

How to reproduce it (as minimally and precisely as possible)?

Install the EFS CSI driver using the manifest file per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver, then deploy sample application pods per https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods/specs and you would see pod stuck in ContainerCreating and having above error.

Anything else we need to know?:

On reviewing the resources noted few things. When installing the EFS CSI driver via the manifest method, could see that efs-csi-node not getting deployed and same have below error.

gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system 
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
efs-csi-node   0         0         0       0            0           beta.kubernetes.io/os=linux   114s
gowthams:~/environment $ 
====
Events:
  Type     Reason        Age                  From                  Message
  ----     ------        ----                 ----                  -------
  Warning  FailedCreate  47s (x15 over 2m9s)  daemonset-controller  Error creating: pods "efs-csi-node-" is forbidden: error looking up service account kube-system/efs-csi-node-sa: serviceaccount "efs-csi-node-sa" not found

On checking, could see that efs-csi-node-sa service account was not created.

gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system                                                                             
Error from server (NotFound): serviceaccounts "efs-csi-node-sa" not found
gowthams:~/environment $

However, when we install the EFS CSI driver with Helm charts, could see that the same getting created and application pods Running fine using EFS via PV/PVC/.

gowthams:~/environment $ helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver     --namespace kube-system     --set image.repository=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efs-csi-driver     --set controller.serviceAccount.create=false     --set controller.serviceAccount.name=efs-csi-controller-sa
Release "aws-efs-csi-driver" does not exist. Installing it now.
NAME: aws-efs-csi-driver
LAST DEPLOYED: Fri Jul 23 04:34:41 2021
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To verify that aws-efs-csi-driver has started, run:

    kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get pods -n kube-system | grep efs 
efs-csi-controller-84bf48878c-ldsrw             3/3     Running   0          18s
efs-csi-controller-84bf48878c-lr5nf             3/3     Running   0          18s
efs-csi-node-h7592                              3/3     Running   0          18s
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system                                                                                   
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
efs-csi-node   1         1         1       1            1           beta.kubernetes.io/os=linux   46s
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system                                                                 
NAME              SECRETS   AGE
efs-csi-node-sa   1         35s
gowthams:~/environment $

Application pod running fine, when using Helm chart to install EFS CSI driver.

gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl get pod 
NAME   READY   STATUS    RESTARTS   AGE
app1   1/1     Running   0          38s
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ 
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ 
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl exec -ti app1 -- tail /data/out1.txt
Fri Jul 23 03:52:58 UTC 2021
Fri Jul 23 03:53:03 UTC 2021
Fri Jul 23 03:53:08 UTC 2021
Fri Jul 23 03:53:13 UTC 2021
Fri Jul 23 03:53:18 UTC 2021
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $

So, looks like the manifest files needs to be looked upon as it is not creating efs-csi-node-sa service account and eventually not creating the efs-csi-node pods as part of Daemonset, thus having pods stuck in ContainerCreating and with error MountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

Environment

EKS 1.20
Kubernetes version (use kubectl version): Client Version: v1.20.4-eks-6b7464
Driver version: EFS CSI Driver v1.3.2

Jul 23 '21 05:07 gowthams316

I think they change the name of the ServiceAccount when mentioned the AWS EFS CSI driver deployment here, from efs-csi-node-sa to efs-csi-controller-sa.

Jul 31 '21 02:07 dchien234

facing same issue on 1.21 version also

Aug 06 '21 09:08 naveen-armedia

I got the same issue - EKS 1.21

➜ k version                                     
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}


➜ kgpon kube-system G efs
efs-csi-controller-6fcd876856-2q528   3/3     Running   0          6h6m
efs-csi-controller-6fcd876856-gkrc9   3/3     Running   0          6h6m
efs-csi-node-8wrhn                    3/3     Running   0          5h51m
efs-csi-node-jfx5h                    3/3     Running   0          5h51m
efs-csi-node-wgtzd                    3/3     Running   0          5h51m

➜ kgpo                   
NAME                              READY   STATUS              RESTARTS   AGE
etcd-0                            1/1     Running             0          13m
etcd-1                            1/1     Running             0          14m
etcd-2                            1/1     Running             0          39m
etcd-snapshotter-27230520-2xjws   0/1     ContainerCreating   0          5m21s

➜ kdpo etcd-snapshotter-27230520-2xjws

....

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    5m48s                default-scheduler  Successfully assigned etcd/etcd-snapshotter-27230520-2xjws to ip-10-120-0-192.us-west-2.compute.internal
  Warning  FailedMount  88s (x2 over 3m45s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[snapshot-volume], unattached volumes=[snapshot-volume kube-api-access-wzvn2]: timed out waiting for the condition
  Warning  FailedMount  27s (x3 over 4m31s)  kubelet            MountVolume.MountDevice failed for volume "pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

➜ kd pvc etcd-snapshotter                                        
Name:          etcd-snapshotter
Namespace:     etcd
StorageClass:  efs
Status:        Bound
Volume:        pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels:        app.kubernetes.io/instance=etcd
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=etcd
               argocd.argoproj.io/instance=etcd
               helm.sh/chart=etcd-6.8.4
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      32Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       etcd-0
               etcd-1
               etcd-2
               etcd-snapshotter-27230520-2xjws
Events:
  Type    Reason                 Age                From                                                                                             Message
  ----    ------                 ----               ----                                                                                             -------
  Normal  Provisioning           18m                efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e  External provisioner is provisioning volume for claim "etcd/etcd-snapshotter"
  Normal  ExternalProvisioning   18m (x2 over 18m)  persistentvolume-controller                                                                      waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator
  Normal  ProvisioningSucceeded  18m                efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e  Successfully provisioned volume pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280

✗  kd pv pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280                                        
Name:            pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: efs.csi.aws.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    efs
Status:          Bound
Claim:           etcd/etcd-snapshotter
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        32Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            efs.csi.aws.com
    FSType:            
    VolumeHandle:      fs-e169c9e7::fsap-0dbc026c2749ac05a
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1633811277796-8081-efs.csi.aws.com
Events:                <none>

Used helm chart

➜ helm search repo efs                                                    
NAME                                 	CHART VERSION	APP VERSION	DESCRIPTION                        
aws-efs-csi-driver/aws-efs-csi-driver	2.2.0        	1.3.4      	A Helm chart for AWS EFS CSI Driver

values.yaml

storageClasses: 
  - name: efs
    parameters:
      provisioningMode: efs-ap
      fileSystemId: fs-e169c9e7
      directoryPerms: "700"
      gidRangeStart: "1000"
      gidRangeEnd: "2000"
      basePath: "/snapshots"
    reclaimPolicy: Delete
    volumeBindingMode: Immediate
node:
  nodeSelector:
    etcd: "true"

Oct 10 '21 02:10 dmitry-mightydevops

Interesting thing is that if I remove

node:
  nodeSelector:
    etcd: "true"

from the values - then I get a lot more PODS added (where I don't need NFS PVC at all) to other nodes, and then I get no error mentioned above.

Oct 10 '21 03:10 dmitry-mightydevops

I got a same issue on EKS on Fargate (v1.21.2-eks-0389ca3)

Oct 27 '21 06:10 MasatoshiTada8888

I've got the same issue too.

Dec 14 '21 14:12 jjcallis

Has there been any resolution to this? I am experiencing the same issue.

Jan 26 '22 13:01 tropicallydiv

Be sure NOT to set node.nodeSelector It's a DaemonSet and should be left to deploy to all nodes or you'll get the above error if your pod lands on a node without the efs-csi-node-xxxx pod. UNLESS you are specifically trying to only allow EFS access from specific nodes.

Mar 21 '22 22:03 LevonBecker

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 19 '22 23:06 k8s-triage-robot

/remove-lifecycle stale

Jun 29 '22 17:06 Ghost---Shadow

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 27 '22 18:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 27 '22 18:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Nov 26 '22 19:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 26 '22 19:11 k8s-ci-robot

aws-efs-csi-driver aws-efs-csi-driver copied to clipboard

Install of Amazon EFS driver via manifest fails to create application pods and same has error "attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers"

aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard