aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Dynamic provisioning not working for AWS EFS CSI Driver

Open RK-GITHUB opened this issue 3 years ago • 25 comments

/kind bug

What happened? I am trying to test dynamic provisioning for aws efs csi driver, but not working

What you expected to happen? PV should be created for pod PVC

How to reproduce it (as minimally and precisely as possible)? I followed steps listed on https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

Anything else we need to know?:

> kubectl get pods          
NAME      READY   STATUS    RESTARTS   AGE
efs-app   0/1     Pending   0          7s

> kubectl get pvc
NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
efs-claim   Pending                                      efs-dummy       7s

> kubectl describe pvc efs-claim
Name:          efs-claim
Namespace:     default
StorageClass:  efs-dummy
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       efs-app
Events:
  Type    Reason                Age                    From                         Message
  ----    ------                ----                   ----                         -------
  Normal  ExternalProvisioning  2m23s (x302 over 77m)  persistentvolume-controller  waiting for a volume to be created, either  by external provisioner "efs.csi.aws.com" or manually created by system administrator

> kubectl get sc
NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
efs-dummy           efs.csi.aws.com         Delete          WaitForFirstConsumer   false                  10s
efs-sc              efs.csi.aws.com         Delete          Immediate              false                  109s

Environment Single node eks cluster All resources and nodes are in same zone I also tried with different chart versions 1.2.3/1.2.0/1.0.0

 > kubectl version --short
 Client Version: v1.19.0
 Server Version: v1.19.8-eks-96780e

Additional comments:

  • I tested Static Provisioning and its working properly
  • Are there any additional configuration required for the EKS cluster for Dynamic Provisioning?
  • All logs looks good
  • This could be possible a bug or something is missing in the documentation

RK-GITHUB avatar Jun 19 '21 20:06 RK-GITHUB

I can confirm this issue, but i got some error event on pvc. My version was 1.3.0

Warning ProvisioningFailed 48s efs.csi.aws.com_ip-xx-xx-xx-xx.eu-central-1.compute.internal_xxxxxxxxxx failed to provision volume with StorageClass "efs-sc": rpc error: code = Internal desc = Failed to create Access point in File System fs-xxxxxxxxx : Failed to create access point: ValidationException: status code: 400, request id: xxxxxxxxxxxxxxxxxxxx

Controller Log: I0621 15:40:01.735018 1 controller.go:1332] provision "default/efs-testclaim" class "efs-sc": started I0621 15:40:01.735305 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-testclaim", UID:"654fd821-c7ff-4385-9982-8f42626cb3f2", APIVersion:"v1", ResourceVersion:"5803470", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/efs-testclaim" E0621 15:40:01.740664 1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"efs-testclaim.168aa2eb0cdc555a", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"5803472", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-testclaim", UID:"654fd821-c7ff-4385-9982-8f42626cb3f2", APIVersion:"v1", ResourceVersion:"5803470", FieldPath:""}, Reason:"Provisioning", Message:"External provisioner is provisioning volume for claim \"default/efs-testclaim\"", Source:v1.EventSource{Component:"efs.csi.aws.com_ip-xxx-xx-xx-xx.eu-central-1.compute.internal_xxxxxxx", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63759886546, loc:(*time.Location)(0x26270e0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc02c4a946bd02998, ext:934003510752212, loc:(*time.Location)(0x26270e0)}}, Count:10, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "efs-testclaim.168aa2eb0cdc555a" is forbidden: User "system:serviceaccount:infrastructure:efs-csi-driver" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!) I0621 15:40:01.815434 1 controller.go:1099] Final error received, removing PVC 654fd821-c7ff-4385-9982-8f42626cb3f2 from claims in progress W0621 15:40:01.815457 1 controller.go:958] Retrying syncing claim "654fd821-c7ff-4385-9982-8f42626cb3f2", failure 9 E0621 15:40:01.815481 1 controller.go:981] error syncing claim "654fd821-c7ff-4385-9982-8f42626cb3f2": failed to provision volume with StorageClass "efs-sc": rpc error: code = Internal desc = Failed to create Access point in File System fs-f148e5aa : Failed to create access point: ValidationException: status code: 400, request id: xxxxxxxxxxx I0621 15:40:01.815506 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-testclaim", UID:"654fd821-c7ff-4385-9982-8f42626cb3f2", APIVersion:"v1", ResourceVersion:"5803470", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "efs-sc": rpc error: code = Internal desc = Failed to create Access point in File System fs-xxxxx : Failed to create access point: ValidationException: status code: 400, request id: xxxxxxxxxxxx

djakielski avatar Jun 21 '21 15:06 djakielski

@djakielskiadesso Can you provide logs from efs-plugin on the efs-csi-controller pod.

kbasv avatar Jun 21 '21 20:06 kbasv

@RK-GITHUB Can you describe your SC efs-dummy? Also you provide efs-plugin and csi-provisioner logs from your efs-csi-controller pod?

kbasv avatar Jun 21 '21 21:06 kbasv

I'm using version 1.3.1 and am having the same issue with dynamic provisioning.

Specs

storageclass.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs
provisioner: efs.csi.aws.com
mountOptions:
- tls
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-XXXXXXXX
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/"

Helm Chart template persistentvolumeclaim.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: {{ .Release.Namespace | quote }}
  name: efs
  labels:
    app: {{ .Values.name | quote }}
    app.kubernetes.io/name: {{ .Values.name | quote }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app.kubernetes.io/managed-by: {{ .Release.Service }}
    app.kubernetes.io/component: "persistant-volume-claim"
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: "efs"
  resources:
    requests:
      storage: 5Gi

kubectl describes

kubectl describe sa -n kube-system efs-csi-controller-sa

Name:                efs-csi-controller-sa
Namespace:           kube-system
Labels:              app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=aws-efs-csi-driver
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/nugeo-eks-efs-csi-driver-admin
                    meta.helm.sh/release-name: aws-efs-csi-driver
                    meta.helm.sh/release-namespace: kube-system
Image pull secrets:  <none>
Mountable secrets:   efs-csi-controller-sa-token-tm9j7
Tokens:              efs-csi-controller-sa-token-tm9j7
Events:              <none>

kubectl describe sc efs:

Name:            efs
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"efs"},"mountOptions":["tls"],"parameters":{"basePath":"/","directoryPerms":"700","fileSystemId":"fs-XXXXXXXX","gidRangeEnd":"2000","gidRangeStart":"1000","provisioningMode":"efs-ap"},"provisioner":"efs.csi.aws.com"}

Provisioner:           efs.csi.aws.com
Parameters:            basePath=/,directoryPerms=700,fileSystemId=fs-XXXXXXXX,gidRangeEnd=2000,gidRangeStart=1000,provisioningMode=efs-ap
AllowVolumeExpansion:  <unset>
MountOptions:
tls
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>

kubectl describe pvc -n mapserver efs:

Name:          efs
Namespace:     mapserver
StorageClass:  efs
Status:        Pending
Volume:
Labels:        app=mapserver
            app.kubernetes.io/component=persistant-volume-claim
            app.kubernetes.io/instance=mapserver
            app.kubernetes.io/managed-by=Helm
            app.kubernetes.io/name=mapserver
Annotations:   meta.helm.sh/release-name: mapserver
            meta.helm.sh/release-namespace: mapserver
            volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
Type     Reason              Age    From                                                                                               Message
----     ------              ----   ----                                                                                               -------
Normal   Provisioning        2m35s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  External provisioner is provisioning volume for claim "mapserver/efs"
Warning  ProvisioningFailed  2m34s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 0a435613-fe4a-4167-9e58-ab1f1bbddbf4
Warning  ProvisioningFailed  2m33s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 97facbfa-3b08-45dc-80f3-facf849dfe6a
Warning  ProvisioningFailed  2m31s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 4e582c92-d292-4078-919e-166c1c080a06
Warning  ProvisioningFailed  2m27s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 1025734f-daef-4eba-a21a-679547a335c7
Warning  ProvisioningFailed  2m19s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 745c25a4-bcbf-40fc-960f-7bbbdfebde37
Warning  ProvisioningFailed  2m3s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: d7a6a9fd-f747-4b0c-a7d2-0865637af60c
Warning  ProvisioningFailed  91s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 896dbeda-2f83-4ab1-b5f7-b741546d61ff
Warning  ProvisioningFailed  27s  efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef  failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: 1b51c77f-23cf-4bf7-9521-ab2363c36470
Normal   ExternalProvisioning  10s (x12 over 2m35s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

Logs

kubectl logs -n kube-system efs-csi-controller-8478bf8f8-wzr9k efs-plugin:

...
E0623 04:20:42.384546       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: d7a6a9fd-f747-4b0c-a7d2-0865637af60c

kubectl logs -n kube-system efs-csi-controller-8478bf8f8-wzr9k csi-provisioner:

...
I0623 04:20:42.333429       1 controller.go:1332] provision "mapserver/efs" class "efs": started
I0623 04:20:42.333997       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"mapserver", Name:"efs", UID:"a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad", APIVersion:"v1", ResourceVersion:"6568188", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "mapserver/efs"
E0623 04:20:42.344065       1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"efs.168b1b364ff548c9", GenerateName:"", Namespace:"mapserver", SelfLink:"", UID:"", ResourceVersion:"6568190", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"mapserver", Name:"efs", UID:"a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad", APIVersion:"v1", ResourceVersion:"6568188", FieldPath:""}, Reason:"Provisioning", Message:"External provisioner is provisioning volume for claim \"mapserver/efs\"", Source:v1.EventSource{Component:"efs.csi.aws.com_ip-XX-XX-XX-XX.ca-central-1.compute.internal_94d26c83-0fe2-4d98-a16f-44db8451b7ef", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63760018810, loc:(*time.Location)(0x26270e0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc02ccb8693e04d46, ext:19947674069063, loc:(*time.Location)(0x26270e0)}}, Count:6, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "efs.168b1b364ff548c9" is forbidden: User "system:serviceaccount:kube-system:efs-csi-controller-sa" cannot patch resource "events" in API group "" in the namespace "mapserver"' (will not retry!)
I0623 04:20:42.384811       1 controller.go:1099] Final error received, removing PVC a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad from claims in progress
W0623 04:20:42.384833       1 controller.go:958] Retrying syncing claim "a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad", failure 5
E0623 04:20:42.384873       1 controller.go:981] error syncing claim "a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad": failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: d7a6a9fd-f747-4b0c-a7d2-0865637af60c
I0623 04:20:42.385138       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"mapserver", Name:"efs", UID:"a1d9b6d6-a6e2-4d3d-815b-feff20d7a2ad", APIVersion:"v1", ResourceVersion:"6568188", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-XXXXXXXX : Failed to create access point: ValidationException:
        status code: 400, request id: d7a6a9fd-f747-4b0c-a7d2-0865637af60c

sergebasso avatar Jun 23 '21 04:06 sergebasso

@kbasv : Please find below

> kubectl logs -f efs-csi-controller-6f6dd8bcbf-qvthr -n kube-system -c efs-plugin     
I0624 03:30:29.006715       1 config_dir.go:62] Mounted directories do not exist, creating directory at '/etc/amazon/efs'
I0624 03:30:29.018338       1 mount_linux.go:173] Cannot run systemd-run, assuming non-systemd OS
I0624 03:30:29.018356       1 driver.go:140] Did not find any input tags.
I0624 03:30:29.018635       1 driver.go:113] Registering Node Server
I0624 03:30:29.018652       1 driver.go:115] Registering Controller Server
I0624 03:30:29.018674       1 driver.go:118] Starting watchdog
I0624 03:30:29.018764       1 efs_watch_dog.go:209] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
I0624 03:30:29.018863       1 efs_watch_dog.go:209] Copying /etc/amazon/efs/efs-utils.crt since it doesn't exist
I0624 03:30:29.023876       1 driver.go:124] Staring subreaper
I0624 03:30:29.023894       1 driver.go:127] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
E0624 03:36:22.972595       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context canceled
E0624 03:36:33.972617       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context deadline exceeded
E0624 03:36:43.972949       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context deadline exceeded
E0624 03:36:53.974147       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context canceled
E0624 03:37:03.974025       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context deadline exceeded
E0624 03:37:13.974293       1 driver.go:103] GRPC error: rpc error: code = Internal desc = Failed to fetch File System info: Describe File System failed: RequestCanceled: request context canceled
caused by: context canceled

> kubectl logs -f efs-csi-controller-6f6dd8bcbf-qvthr -n kube-system -c csi-provisioner
W0624 03:30:29.275116       1 feature_gate.go:235] Setting GA feature gate Topology=true. It will be removed in a future release.
I0624 03:30:29.275520       1 feature_gate.go:243] feature gates: &{map[Topology:true]}
I0624 03:30:29.275662       1 csi-provisioner.go:132] Version: v2.1.1-0-g353098c90
I0624 03:30:29.275754       1 csi-provisioner.go:155] Building kube configs for running in cluster...
I0624 03:30:29.287479       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0624 03:30:29.288637       1 common.go:111] Probing CSI driver for readiness
I0624 03:30:29.291097       1 csi-provisioner.go:202] Detected CSI driver efs.csi.aws.com
I0624 03:30:29.292250       1 csi-provisioner.go:244] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0624 03:30:29.293253       1 controller.go:756] Using saving PVs to API server in background
I0624 03:30:29.300941       1 leaderelection.go:243] attempting to acquire leader lease kube-system/efs-csi-aws-com...
I0624 03:30:29.322545       1 leaderelection.go:253] successfully acquired lease kube-system/efs-csi-aws-com
I0624 03:30:29.322773       1 leader_election.go:205] became leader, starting
I0624 03:30:29.323734       1 reflector.go:219] Starting reflector *v1.StorageClass (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0624 03:30:29.323964       1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (15m0s) from k8s.io/client-go/informers/factory.go:134
I0624 03:30:29.422900       1 controller.go:835] Starting provisioner controller efs.csi.aws.com_ip-172-31-128-86.us-west-2.compute.internal_5c6ec46c-df62-438b-ae6b-09e348f6cde0!
I0624 03:30:29.422949       1 volume_store.go:97] Starting save volume queue
I0624 03:30:29.423211       1 reflector.go:219] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:872
I0624 03:30:29.423253       1 reflector.go:219] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:869
I0624 03:30:29.523717       1 controller.go:884] Started provisioner controller efs.csi.aws.com_ip-172-31-128-86.us-west-2.compute.internal_5c6ec46c-df62-438b-ae6b-09e348f6cde0!
I0624 03:36:12.971067       1 controller.go:1332] provision "default/efs-claim" class "efs-sc": started
I0624 03:36:12.971779       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"2b02a987-0903-4728-9297-8658e5440fbf", APIVersion:"v1", ResourceVersion:"193316", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/efs-claim"
I0624 03:36:22.971390       1 controller.go:1106] Temporary error received, adding PVC 2b02a987-0903-4728-9297-8658e5440fbf to claims in progress
W0624 03:36:22.971428       1 controller.go:958] Retrying syncing claim "2b02a987-0903-4728-9297-8658e5440fbf", failure 0
E0624 03:36:22.971450       1 controller.go:981] error syncing claim "2b02a987-0903-4728-9297-8658e5440fbf": failed to provision volume with StorageClass "efs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0624 03:36:22.971486       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"2b02a987-0903-4728-9297-8658e5440fbf", APIVersion:"v1", ResourceVersion:"193316", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "efs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0624 03:36:23.971653       1 controller.go:1332] provision "default/efs-claim" class "efs-sc": started
I0624 03:36:23.971883       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"2b02a987-0903-4728-9297-8658e5440fbf", APIVersion:"v1", ResourceVersion:"193316", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/efs-claim"
I0624 03:36:33.971890       1 controller.go:1106] Temporary error received, adding PVC 2b02a987-0903-4728-9297-8658e5440fbf to claims in progress
W0624 03:36:33.971918       1 controller.go:958] Retrying syncing claim "2b02a987-0903-4728-9297-8658e5440fbf", failure 1
E0624 03:36:33.971938       1 controller.go:981] error syncing claim "2b02a987-0903-4728-9297-8658e5440fbf": failed to provision volume with StorageClass "efs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0624 03:36:33.971971       1 controller.go:1332] provision "default/efs-claim" class "efs-sc": started
I0624 03:36:33.972111       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"2b02a987-0903-4728-9297-8658e5440fbf", APIVersion:"v1", ResourceVersion:"193316", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/efs-claim"


> kubectl get sc -n kube-system
NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
efs-sc              efs.csi.aws.com         Delete          Immediate              false                  6h42m
gp2 (default)       kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   13h

> kubectl describe sc efs-sc -n kube-system
Name:            efs-sc
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"efs-sc"},"parameters":{"basePath":"/dynamic_provisioning","directoryPerms":"700","fileSystemId":"fs-92107410","gidRangeEnd":"2000","gidRangeStart":"1000","provisioningMode":"efs-ap"},"provisioner":"efs.csi.aws.com"}

Provisioner:           efs.csi.aws.com
Parameters:            basePath=/dynamic_provisioning,directoryPerms=700,fileSystemId=fs-92107410,gidRangeEnd=2000,gidRangeStart=1000,provisioningMode=efs-ap
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

RK-GITHUB avatar Jun 24 '21 15:06 RK-GITHUB

@RK-GITHUB Can you make sure your storage class has your file system id. The file-system Id in your storage class is same as the example file system Id that we provided here

Csi-driver is trying to perform Describe File System on fs-92107410 which wouldn't exist on your account.

kbasv avatar Jun 25 '21 17:06 kbasv

@sergebasso @djakielskiadesso Do you have the right IAM permissions attached to your controller's service account to create an access point? Example IAM policy which grants here create permissions are here.

kbasv avatar Jun 28 '21 16:06 kbasv

@kbasv : Yes, I made sure file system id in AWS account matches with my storage class file system id.

Yes, I am using the same policy json file.

RK-GITHUB avatar Jun 28 '21 21:06 RK-GITHUB

@kbasv

I confirm that I'm also using the same policy as the example.

sergebasso avatar Jun 29 '21 21:06 sergebasso

I actually had the same issue but it turns out I did not specify basePath correctly.

I had this:

parameters:
  basePath: /

However, this doc states that:

Amazon EFS creates a root directory only if you have provided the CreationInfo: OwnUid, OwnGID, and permissions for the directory. If you do not provide this information, Amazon EFS does not create the root directory. If the root directory does not exist, attempts to mount using the access point will fail.

Based on that I checked the dynamic provisioning example again and noticed that basePath is not equal to /

I tried to modify my storage class' basePath and it started to work fine

MatteoMori avatar Jul 01 '21 11:07 MatteoMori

@MatteoMori

Thanks for pointing this out. There is a bug in the code which fails access point validation. By default, if you do not provide the basePath parameter, controller assumes basePath to be the root, i.e. /. Additionally if you provide basePath parameter to the storage class explicitly as /, it gets appended to the default basePath and now basePath becomes something like // which isn't valid. I'll add a fix to validate // on the controller code.

@sergebasso @djakielskiadesso As a workaround, if you plan to use the file system root as your base path, avoid passing the basePath parameter.

kbasv avatar Jul 01 '21 14:07 kbasv

@RK-GITHUB From the controller logs you have posted here, it appears you are either missing IAM permissions to call DescribeFileSystem efs API or the file system ID in your storage class is incorrect and does not exist.

Another possibility is you are attempting cross AWS account mount. Are the AWS accounts of your file system ID and your k8s cluster different?

kbasv avatar Jul 01 '21 14:07 kbasv

@kbasv: IAM policy attached has the DescribeFileSystem permissions, and also file system does exists.

Both AWS account of filesystem id and K8s cluster are same.

I tested basePath changes suggested above, and did not work.

RK-GITHUB avatar Jul 02 '21 04:07 RK-GITHUB

@RK-GITHUB Can you describe your storage class and post efs-plugin and csi-provisionerlogs from controller?

kbasv avatar Jul 02 '21 13:07 kbasv

@kbasv.

As a workaround, if you plan to use the file system root as your base path, avoid passing the basePath parameter.

It worked. This is my StorageClass now without basePath being set:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs
provisioner: efs.csi.aws.com
mountOptions:
- tls
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-XXXXXXXX
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"

sergebasso avatar Jul 02 '21 15:07 sergebasso

@RK-GITHUB Is your cluster private? If so, VPC endpoint for ”com.amazonaws.ap-northeast-1.elasticfilesystem” is required.

hansh1029 avatar Jul 12 '21 09:07 hansh1029

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 19 '21 11:10 k8s-triage-robot

got similar issue, I'm using ARM nodes EKSctl v0.73.0 here are my logs:

storage.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
    name: artfia
provisioner: efs.csi.aws.com
mountOptions:
    - tls
volumeBindingMode: WaitForFirstConsumer
parameters:
    provisioningMode: efs-ap
    fileSystemId: fs-XXXXXXXX
    directoryPerms: "755"
    gidRangeStart: "1000"
    gidRangeEnd: "2000"

kubectl describe pod php-fpm-864ddc9455-9zjg7

Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    5m56s                 default-scheduler  Successfully assigned default/php-fpm-864ddc9455-9zjg7 to ip-192-168-185-127.ap-east-1.compute.internal
  Warning  FailedMount  3m53s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[persistent-storage kube-api-access-jq55n]: timed out waiting for the condition
  Warning  FailedMount  115s (x2 over 3m56s)  kubelet            MountVolume.SetUp failed for volume "pvc-b8174cbf-7afa-478a-80ad-e5081abda55f" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount  97s                   kubelet            Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[kube-api-access-jq55n persistent-storage]: timed out waiting for the condition

kubectl logs efs-csi-controller-6d9d875b-6qrs6 -n kube-system -c csi-provisioner

W1112 13:12:20.745980       1 feature_gate.go:235] Setting GA feature gate Topology=true. It will be removed in a future release.
I1112 13:12:20.746053       1 feature_gate.go:243] feature gates: &{map[Topology:true]}
I1112 13:12:20.746079       1 csi-provisioner.go:132] Version: v2.1.1-0-g353098c90
I1112 13:12:20.746104       1 csi-provisioner.go:155] Building kube configs for running in cluster...
I1112 13:12:20.765265       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I1112 13:12:20.765848       1 common.go:111] Probing CSI driver for readiness
I1112 13:12:20.768674       1 csi-provisioner.go:202] Detected CSI driver efs.csi.aws.com
I1112 13:12:20.770897       1 csi-provisioner.go:244] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I1112 13:12:20.771795       1 controller.go:756] Using saving PVs to API server in background
I1112 13:12:20.772813       1 leaderelection.go:243] attempting to acquire leader lease kube-system/efs-csi-aws-com...
I1112 13:12:20.785205       1 leaderelection.go:253] successfully acquired lease kube-system/efs-csi-aws-com
I1112 13:12:20.785354       1 leader_election.go:205] became leader, starting
I1112 13:12:20.785971       1 reflector.go:219] Starting reflector *v1.StorageClass (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I1112 13:12:20.786309       1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (15m0s) from k8s.io/client-go/informers/factory.go:134
I1112 13:12:20.885481       1 controller.go:835] Starting provisioner controller efs.csi.aws.com_ip-192-168-185-127.ap-east-1.compute.internal_76a0e0ce-f59e-4b0d-9e13-41dadc42f128!
I1112 13:12:20.885534       1 volume_store.go:97] Starting save volume queue
I1112 13:12:20.885699       1 reflector.go:219] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:869
I1112 13:12:20.886053       1 reflector.go:219] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:872
I1112 13:12:20.985950       1 controller.go:884] Started provisioner controller efs.csi.aws.com_ip-192-168-185-127.ap-east-1.compute.internal_76a0e0ce-f59e-4b0d-9e13-41dadc42f128!
I1112 13:56:52.436051       1 controller.go:1332] provision "default/efs-claim" class "artfia": started
I1112 13:56:52.441322       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"b8174cbf-7afa-478a-80ad-e5081abda55f", APIVersion:"v1", ResourceVersion:"63285", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/efs-claim"
I1112 13:56:52.538556       1 controller.go:838] successfully created PV pvc-b8174cbf-7afa-478a-80ad-e5081abda55f for PVC efs-claim and csi volume name fs-3786d6fa::fsap-0a295d35933994d59
I1112 13:56:52.538592       1 controller.go:1439] provision "default/efs-claim" class "artfia": volume "pvc-b8174cbf-7afa-478a-80ad-e5081abda55f" provisioned
I1112 13:56:52.538610       1 controller.go:1456] provision "default/efs-claim" class "artfia": succeeded
I1112 13:56:52.558089       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"efs-claim", UID:"b8174cbf-7afa-478a-80ad-e5081abda55f", APIVersion:"v1", ResourceVersion:"63285", FieldPa

kubectl logs efs-csi-controller-6d9d875b-6qrs6 -n kube-system -c efs-plugin

I1112 13:12:20.413106       1 config_dir.go:62] Mounted directories do not exist, creating directory at '/etc/amazon/efs'
I1112 13:12:20.421635       1 mount_linux.go:173] Cannot run systemd-run, assuming non-systemd OS
I1112 13:12:20.421651       1 driver.go:140] Did not find any input tags.
I1112 13:12:20.421778       1 driver.go:113] Registering Node Server
I1112 13:12:20.421787       1 driver.go:115] Registering Controller Server
I1112 13:12:20.421797       1 driver.go:118] Starting efs-utils watchdog
I1112 13:12:20.421857       1 efs_watch_dog.go:209] Copying /etc/amazon/efs/efs-utils.conf since it doesn't exist
I1112 13:12:20.421932       1 efs_watch_dog.go:209] Copying /etc/amazon/efs/efs-utils.crt since it doesn't exist
I1112 13:12:20.424120       1 driver.go:124] Starting reaper
I1112 13:12:20.424130       1 driver.go:127] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}

xicond avatar Nov 12 '21 14:11 xicond

Error "Failed to fetch File System info: Describe File System failed" when trying to provision dynamically on private EKS is fixed by my PR #585 that I submitted a week ago. Be great if someone could review\merge.

holmesb avatar Nov 18 '21 14:11 holmesb

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 18 '21 14:12 k8s-triage-robot

Hey @kbasv

i tested it with the latest version 1.5.0. The bug with base path still exist! When i remove base path from sc, than everythings works as expected.

/remove-lifecycle rotten

djakielski avatar Dec 26 '21 00:12 djakielski

@MatteoMori

Thanks for pointing this out. There is a bug in the code which fails access point validation. By default, if you do not provide the basePath parameter, controller assumes basePath to be the root, i.e. /. Additionally if you provide basePath parameter to the storage class explicitly as /, it gets appended to the default basePath and now basePath becomes something like // which isn't valid. I'll add a fix to validate // on the controller code.

@sergebasso @djakielskiadesso As a workaround, if you plan to use the file system root as your base path, avoid passing the basePath parameter.

Goodness me. This cost me almost a week trying to figure out why the pv[c] wasn't being created. Thank you for this tip. I moved past this and have now hit the following issue from the described pod: Warning FailedMount 31s (x20 over 25m) kubelet MountVolume.SetUp failed for volume "pvc-69cb5470-2943-4696-8aaf-c74e0b93dcfa" : rpc error: code = Internal desc = Could not mount "fs-05575cefa20ad5077:/" at "/var/lib/kubelet/pods/8f27d8f0-0284-4fb8-b0ea-8b06f69ef61e/volumes/kubernetes.io~csi/pvc-69cb5470-2943-4696-8aaf-c74e0b93dcfa/mount": mount failed: exit status 32 Mounting command: mount Mounting arguments: -t efs -o accesspoint=fsap-0a69fa23949a4c283,tls fs-05575cefa20ad5077:/ /var/lib/kubelet/pods/8f27d8f0-0284-4fb8-b0ea-8b06f69ef61e/volumes/kubernetes.io~csi/pvc-69cb5470-2943-4696-8aaf-c74e0b93dcfa/mount Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri" b'mount.nfs4: access denied by server while mounting 127.0.0.1:/'

DarkStar1 avatar Feb 18 '22 17:02 DarkStar1

I installed the EFS CSI driver to mount EFS on EKS, I followed Amazon EFS CSI driver.

I've faced the below error while deploying PersistentVolumeClaim.

Error from server (Forbidden): error when creating "claim.yml": persistentvolumeclaims "efs-claim" is forbidden: may only update PVC status

StorageClass.yaml -->

   kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: efs-sc
    provisioner: efs.csi.aws.com
    mountOptions:
      - tls  

pv.yaml -->

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: efs-pv
    spec:
      capacity:
        storage: 5Gi
      volumeMode: Filesystem
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: efs-sc
      csi:
        driver: efs.csi.aws.com
        volumeHandle: fs-xxxxxxxxxxx 

pvclaim.yaml -->

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: efs-claim
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: efs-sc
      resources:
        requests:
          storage: 5Gi
      selector:
        matchLabels:
          name: production-environment
          role: prod 

Kindly help me to resolve this

jawad846 avatar Mar 22 '22 14:03 jawad846

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 21 '22 04:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 21 '22 04:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 20 '22 05:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 20 '22 05:08 k8s-ci-robot

Please reopen this issue. This error is persisting with dynamic provisioning with or without basePath.

The below needs to change from below

kubectl kustomize \
    "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-" > public-ecr-driver.yaml

to

kubectl kustomize \
    "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master" > public-ecr-driver.yaml

However, static provisioning is working with EFS.

PS: Only when ref is changed to master static is working.

ackris avatar Sep 18 '22 04:09 ackris