aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard
Install of Amazon EFS driver via manifest fails to create application pods and same has error "attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers"
/kind bug
What happened? When trying to install EFS CSI driver via the yaml or manifest per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver with below commands fails to create application pod and has errors.
Used the below command to install driver from EKS documentation
kubectl kustomize \
"github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr?ref=release-1.3" > driver.yaml
kubectl apply -f driver.yaml
Also, tried the below from EFS CSI driver Github page
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.3"
Pod events:
Warning FailedMount 2s (x6 over 18s) kubelet MountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers
What you expected to happen?
- It should have created the application pod using EFS via PV/PVC.
- Also, to note that it works fine with Helm commands per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver. But fails with manifest method.
How to reproduce it (as minimally and precisely as possible)?
- Install the EFS CSI driver using the manifest file per https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver and https://github.com/kubernetes-sigs/aws-efs-csi-driver#deploy-the-driver, then deploy sample application pods per https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods/specs and you would see pod stuck in ContainerCreating and having above error.
Anything else we need to know?:
- On reviewing the resources noted few things. When installing the EFS CSI driver via the manifest method, could see that efs-csi-node not getting deployed and same have below error.
gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
efs-csi-node 0 0 0 0 0 beta.kubernetes.io/os=linux 114s
gowthams:~/environment $
====
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 47s (x15 over 2m9s) daemonset-controller Error creating: pods "efs-csi-node-" is forbidden: error looking up service account kube-system/efs-csi-node-sa: serviceaccount "efs-csi-node-sa" not found
- On checking, could see that
efs-csi-node-sa
service account was not created.
gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system
Error from server (NotFound): serviceaccounts "efs-csi-node-sa" not found
gowthams:~/environment $
- However, when we install the EFS CSI driver with Helm charts, could see that the same getting created and application pods Running fine using EFS via PV/PVC/.
gowthams:~/environment $ helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver --namespace kube-system --set image.repository=602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efs-csi-driver --set controller.serviceAccount.create=false --set controller.serviceAccount.name=efs-csi-controller-sa
Release "aws-efs-csi-driver" does not exist. Installing it now.
NAME: aws-efs-csi-driver
LAST DEPLOYED: Fri Jul 23 04:34:41 2021
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To verify that aws-efs-csi-driver has started, run:
kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
gowthams:~/environment $
gowthams:~/environment $ kubectl get pods -n kube-system | grep efs
efs-csi-controller-84bf48878c-ldsrw 3/3 Running 0 18s
efs-csi-controller-84bf48878c-lr5nf 3/3 Running 0 18s
efs-csi-node-h7592 3/3 Running 0 18s
gowthams:~/environment $
gowthams:~/environment $ kubectl get ds efs-csi-node -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
efs-csi-node 1 1 1 1 1 beta.kubernetes.io/os=linux 46s
gowthams:~/environment $
gowthams:~/environment $ kubectl get sa efs-csi-node-sa -n kube-system
NAME SECRETS AGE
efs-csi-node-sa 1 35s
gowthams:~/environment $
Application pod running fine, when using Helm chart to install EFS CSI driver.
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl get pod
NAME READY STATUS RESTARTS AGE
app1 1/1 Running 0 38s
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $ kubectl exec -ti app1 -- tail /data/out1.txt
Fri Jul 23 03:52:58 UTC 2021
Fri Jul 23 03:53:03 UTC 2021
Fri Jul 23 03:53:08 UTC 2021
Fri Jul 23 03:53:13 UTC 2021
Fri Jul 23 03:53:18 UTC 2021
gowthams:~/environment/aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs (master) $
- So, looks like the manifest files needs to be looked upon as it is not creating
efs-csi-node-sa
service account and eventually not creating theefs-csi-node
pods as part of Daemonset, thus having pods stuck in ContainerCreating and with errorMountVolume.MountDevice failed for volume "efs-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers
Environment
- EKS 1.20
- Kubernetes version (use
kubectl version
): Client Version: v1.20.4-eks-6b7464 - Driver version: EFS CSI Driver v1.3.2
I think they change the name of the ServiceAccount when mentioned the AWS EFS CSI driver deployment here, from efs-csi-node-sa
to efs-csi-controller-sa
.
facing same issue on 1.21 version also
I got the same issue - EKS 1.21
➜ k version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
➜ kgpon kube-system G efs
efs-csi-controller-6fcd876856-2q528 3/3 Running 0 6h6m
efs-csi-controller-6fcd876856-gkrc9 3/3 Running 0 6h6m
efs-csi-node-8wrhn 3/3 Running 0 5h51m
efs-csi-node-jfx5h 3/3 Running 0 5h51m
efs-csi-node-wgtzd 3/3 Running 0 5h51m
➜ kgpo
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 0 13m
etcd-1 1/1 Running 0 14m
etcd-2 1/1 Running 0 39m
etcd-snapshotter-27230520-2xjws 0/1 ContainerCreating 0 5m21s
➜ kdpo etcd-snapshotter-27230520-2xjws
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m48s default-scheduler Successfully assigned etcd/etcd-snapshotter-27230520-2xjws to ip-10-120-0-192.us-west-2.compute.internal
Warning FailedMount 88s (x2 over 3m45s) kubelet Unable to attach or mount volumes: unmounted volumes=[snapshot-volume], unattached volumes=[snapshot-volume kube-api-access-wzvn2]: timed out waiting for the condition
Warning FailedMount 27s (x3 over 4m31s) kubelet MountVolume.MountDevice failed for volume "pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name efs.csi.aws.com not found in the list of registered CSI drivers
➜ kd pvc etcd-snapshotter
Name: etcd-snapshotter
Namespace: etcd
StorageClass: efs
Status: Bound
Volume: pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels: app.kubernetes.io/instance=etcd
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=etcd
argocd.argoproj.io/instance=etcd
helm.sh/chart=etcd-6.8.4
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 32Gi
Access Modes: RWX
VolumeMode: Filesystem
Used By: etcd-0
etcd-1
etcd-2
etcd-snapshotter-27230520-2xjws
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 18m efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e External provisioner is provisioning volume for claim "etcd/etcd-snapshotter"
Normal ExternalProvisioning 18m (x2 over 18m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator
Normal ProvisioningSucceeded 18m efs.csi.aws.com_ip-10-120-0-192.us-west-2.compute.internal_44761869-f592-468c-8e1c-e989bcbc303e Successfully provisioned volume pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
✗ kd pv pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Name: pvc-9e8343f2-97a4-41de-8d58-59a9cc4ee280
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: efs.csi.aws.com
Finalizers: [kubernetes.io/pv-protection]
StorageClass: efs
Status: Bound
Claim: etcd/etcd-snapshotter
Reclaim Policy: Delete
Access Modes: RWX
VolumeMode: Filesystem
Capacity: 32Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: efs.csi.aws.com
FSType:
VolumeHandle: fs-e169c9e7::fsap-0dbc026c2749ac05a
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1633811277796-8081-efs.csi.aws.com
Events: <none>
Used helm chart
➜ helm search repo efs
NAME CHART VERSION APP VERSION DESCRIPTION
aws-efs-csi-driver/aws-efs-csi-driver 2.2.0 1.3.4 A Helm chart for AWS EFS CSI Driver
values.yaml
storageClasses:
- name: efs
parameters:
provisioningMode: efs-ap
fileSystemId: fs-e169c9e7
directoryPerms: "700"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/snapshots"
reclaimPolicy: Delete
volumeBindingMode: Immediate
node:
nodeSelector:
etcd: "true"
Interesting thing is that if I remove
node:
nodeSelector:
etcd: "true"
from the values - then I get a lot more PODS added (where I don't need NFS PVC at all) to other nodes, and then I get no error mentioned above.
I got a same issue on EKS on Fargate (v1.21.2-eks-0389ca3)
I've got the same issue too.
Has there been any resolution to this? I am experiencing the same issue.
Be sure NOT to set node.nodeSelector
It's a DaemonSet and should be left to deploy to all nodes or you'll get the above error if your pod lands on a node without the efs-csi-node-xxxx pod. UNLESS you are specifically trying to only allow EFS access from specific nodes.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.