aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Install of Amazon EFS driver via manifest fails to create application pods and same has error "attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers"

Open gowthams316 opened this issue 2 years ago • 11 comments

/kind bug

What happened? When trying to install EFS CSI driver via the yaml or manifest per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver with below commands fails to create application pod and has errors.

Used the below command to install driver from EKS documentation

kubectl kustomize \
    "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr?ref=release-1.3" > driver.yaml

kubectl apply -f driver.yaml

Also, tried the below from EFS CSI driver Github page

kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.3"

Pod events:

  Warning  FailedMount       2s (x6 over 18s)   kubelet            MountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

What you expected to happen?

  • It should have created the application pod using EFS via PV/PVC.
  • Also, to note that it works fine with Helm commands per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver. But fails with manifest method.

How to reproduce it (as minimally and precisely as possible)?

  • Install the EFS CSI driver using the manifest file per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver, then deploy sample application pods per https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods/specs and you would see pod stuck in ContainerCreating and having above error.

Anything else we need to know?:

  • On reviewing the resources noted few things. When installing the EFS CSI driver via the manifest method, could see that efs-csi-node not getting deployed and same have below error.
gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system 
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
efs-csi-node   0         0         0       0            0           beta.kubernetes.io/os=linux   114s
gowthams:~/environment $ 
====
Events:
  Type     Reason        Age                  From                  Message
  ----     ------        ----                 ----                  -------
  Warning  FailedCreate  47s (x15 over 2m9s)  daemonset-controller  Error creating: pods "efs-csi-node-" is forbidden: error looking up service account kube-system/efs-csi-node-sa: serviceaccount "efs-csi-node-sa" not found
  • On checking, could see that efs-csi-node-sa service account was not created.
gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system                                                                             
Error from server (NotFound): serviceaccounts "efs-csi-node-sa" not found
gowthams:~/environment $ 
  • However, when we install the EFS CSI driver with Helm charts, could see that the same getting created and application pods Running fine using EFS via PV/PVC/.
gowthams:~/environment $ helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver     --namespace kube-system     --set image.repository=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efs-csi-driver     --set controller.serviceAccount.create=false     --set controller.serviceAccount.name=efs-csi-controller-sa
Release "aws-efs-csi-driver" does not exist. Installing it now.
NAME: aws-efs-csi-driver
LAST DEPLOYED: Fri Jul 23 04:34:41 2021
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To verify that aws-efs-csi-driver has started, run:

    kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get pods -n kube-system | grep efs 
efs-csi-controller-84bf48878c-ldsrw             3/3     Running   0          18s
efs-csi-controller-84bf48878c-lr5nf             3/3     Running   0          18s
efs-csi-node-h7592                              3/3     Running   0          18s
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system                                                                                   
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
efs-csi-node   1         1         1       1            1           beta.kubernetes.io/os=linux   46s
gowthams:~/environment $ 
gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system                                                                 
NAME              SECRETS   AGE
efs-csi-node-sa   1         35s
gowthams:~/environment $ 

Application pod running fine, when using Helm chart to install EFS CSI driver.

gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl get pod 
NAME   READY   STATUS    RESTARTS   AGE
app1   1/1     Running   0          38s
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ 
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ 
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl exec -ti app1 -- tail /data/out1.txt
Fri Jul 23 03:52:58 UTC 2021
Fri Jul 23 03:53:03 UTC 2021
Fri Jul 23 03:53:08 UTC 2021
Fri Jul 23 03:53:13 UTC 2021
Fri Jul 23 03:53:18 UTC 2021
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ 
  • So, looks like the manifest files needs to be looked upon as it is not creating efs-csi-node-sa service account and eventually not creating the efs-csi-node pods as part of Daemonset, thus having pods stuck in ContainerCreating and with error MountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

Environment

  • EKS 1.20
  • Kubernetes version (use kubectl version): Client Version: v1.20.4-eks-6b7464
  • Driver version: EFS CSI Driver v1.3.2

gowthams316 avatar Jul 23 '21 05:07 gowthams316

I think they change the name of the ServiceAccount when mentioned the AWS EFS CSI driver deployment here, from efs-csi-node-sa to efs-csi-controller-sa.

dchien234 avatar Jul 31 '21 02:07 dchien234

facing same issue on 1.21 version also

naveen-armedia avatar Aug 06 '21 09:08 naveen-armedia

I got the same issue - EKS 1.21

➜ k version                                     
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}


➜ kgpon kube-system G efs
efs-csi-controller-6fcd876856-2q528   3/3     Running   0          6h6m
efs-csi-controller-6fcd876856-gkrc9   3/3     Running   0          6h6m
efs-csi-node-8wrhn                    3/3     Running   0          5h51m
efs-csi-node-jfx5h                    3/3     Running   0          5h51m
efs-csi-node-wgtzd                    3/3     Running   0          5h51m

➜ kgpo                   
NAME                              READY   STATUS              RESTARTS   AGE
etcd-0                            1/1     Running             0          13m
etcd-1                            1/1     Running             0          14m
etcd-2                            1/1     Running             0          39m
etcd-snapshotter-27230520-2xjws   0/1     ContainerCreating   0          5m21s

➜ kdpo etcd-snapshotter-27230520-2xjws

....

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    5m48s                default-scheduler  Successfully assigned etcd/etcd-snapshotter-27230520-2xjws to ip-10-120-0-192.us-west-2.compute.internal
  Warning  FailedMount  88s (x2 over 3m45s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[snapshot-volume], unattached volumes=[snapshot-volume kube-api-access-wzvn2]: timed out waiting for the condition
  Warning  FailedMount  27s (x3 over 4m31s)  kubelet            MountVolume.MountDevice failed for volume "pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers

➜ kd pvc etcd-snapshotter                                        
Name:          etcd-snapshotter
Namespace:     etcd
StorageClass:  efs
Status:        Bound
Volume:        pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels:        app.kubernetes.io/instance=etcd
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=etcd
               argocd.argoproj.io/instance=etcd
               helm.sh/chart=etcd-6.8.4
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      32Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       etcd-0
               etcd-1
               etcd-2
               etcd-snapshotter-27230520-2xjws
Events:
  Type    Reason                 Age                From                                                                                             Message
  ----    ------                 ----               ----                                                                                             -------
  Normal  Provisioning           18m                efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e  External provisioner is provisioning volume for claim "etcd/etcd-snapshotter"
  Normal  ExternalProvisioning   18m (x2 over 18m)  persistentvolume-controller                                                                      waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator
  Normal  ProvisioningSucceeded  18m                efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e  Successfully provisioned volume pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280

✗  kd pv pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280                                        
Name:            pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: efs.csi.aws.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    efs
Status:          Bound
Claim:           etcd/etcd-snapshotter
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        32Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            efs.csi.aws.com
    FSType:            
    VolumeHandle:      fs-e169c9e7::fsap-0dbc026c2749ac05a
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1633811277796-8081-efs.csi.aws.com
Events:                <none>

Used helm chart

➜ helm search repo efs                                                    
NAME                                 	CHART VERSION	APP VERSION	DESCRIPTION                        
aws-efs-csi-driver/aws-efs-csi-driver	2.2.0        	1.3.4      	A Helm chart for AWS EFS CSI Driver

values.yaml

storageClasses: 
  - name: efs
    parameters:
      provisioningMode: efs-ap
      fileSystemId: fs-e169c9e7
      directoryPerms: "700"
      gidRangeStart: "1000"
      gidRangeEnd: "2000"
      basePath: "/snapshots"
    reclaimPolicy: Delete
    volumeBindingMode: Immediate
node:
  nodeSelector:
    etcd: "true"

dmitry-mightydevops avatar Oct 10 '21 02:10 dmitry-mightydevops

Interesting thing is that if I remove

node:
  nodeSelector:
    etcd: "true"

from the values - then I get a lot more PODS added (where I don't need NFS PVC at all) to other nodes, and then I get no error mentioned above.

dmitry-mightydevops avatar Oct 10 '21 03:10 dmitry-mightydevops

I got a same issue on EKS on Fargate (v1.21.2-eks-0389ca3)

MasatoshiTada8888 avatar Oct 27 '21 06:10 MasatoshiTada8888

I've got the same issue too.

jjcallis avatar Dec 14 '21 14:12 jjcallis

Has there been any resolution to this? I am experiencing the same issue.

tropicallydiv avatar Jan 26 '22 13:01 tropicallydiv

Be sure NOT to set node.nodeSelector It's a DaemonSet and should be left to deploy to all nodes or you'll get the above error if your pod lands on a node without the efs-csi-node-xxxx pod. UNLESS you are specifically trying to only allow EFS access from specific nodes.

LevonBecker avatar Mar 21 '22 22:03 LevonBecker

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 19 '22 23:06 k8s-triage-robot

/remove-lifecycle stale

Ghost---Shadow avatar Jun 29 '22 17:06 Ghost---Shadow

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 27 '22 18:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 27 '22 18:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Nov 26 '22 19:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 26 '22 19:11 k8s-ci-robot