aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

EKS Addon install missing AWS_DEFAULT_REGION

Open philnichol opened this issue 4 months ago • 0 comments

/kind bug

Thanks in advance for looking into this, and thanks for maintaining this great project :)

What happened? When I install the EKS Addon (tested via terraform or AWS console), with deleteAccessPointRootDir = true, IRSA configured, and restrict access to IMDS, when I delete a pvc, I see these errors in my logs, and the PVC never gets deleted

E1015 08:53:11.829540       1 mount_linux.go:231] Mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs -o tls,iam fs-XXXXXXXXXXXXXXXXXXX /var/lib/csi/pv/fsap-XXXXXXXXXXXXX                                                                    Output: Error retrieving region. Please set the "region" parameter in the efs-utils configuration file. 

What you expected to happen? I expect the EKS Addon to work out of the box.

How to reproduce it (as minimally and precisely as possible)? This assumes you've restricted access to IMDS from your pods (by setting a hop limit). Docs here.

  • Install the efs-csi-driver EKS Addon on a cluster with deleteAccessPointRootDir = true, with an IRSA service account image

  • Tail the logs (in a separate terminal) kubectl logs deployment/efs-csi-controller -f -n kube-system

  • Create a storageClass, PVC and pod (dynamic provisioning)

# test.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test
parameters:
  basePath: /test
  directoryPerms: "775"
  ensureUniqueDirectory: "false"
  fileSystemId: fs-XXXXXXX
  gid: "65534"
  provisioningMode: efs-ap
  subPathPattern: /
  uid: "65534"
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- tls
- iam
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: test
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: test
    spec:
      containers:
      - image: registry.k8s.io/pause:3.9
        name: test
        resources:
          requests:
            cpu: 20m
            memory: 2Mi
        volumeMounts:
        - mountPath: /test
          name: test
      volumes:
      - name: test
        persistentVolumeClaim:
          claimName: test

  • kubectl apply -f test.yaml
  • kubectl delete -f test.yaml
  • see the logs for efs-csi-controller

Anything else we need to know?: The reason this happens is because when the driver is installed installed via EKS Addon, the efs-plugin container has the AWS_REGION environment variable set.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: efs-csi-controller
  namespace: kube-system
  resourceVersion: "8596255"
  uid: 09438d06-c1b8-4765-89f6-e696c648d19f
spec:
  template:
    spec:
      containers:
      - name: efs-plugin
        env:
        - name: CSI_ENDPOINT
          value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
        - name: AWS_REGION
          value: ap-southeast-2
        - name: CSI_NODE_NAME

With how IRSA works, if there's already an AWS_REGION variable, it doesn't add the AWS_DEFAULT_REGION variable that the container needs to see what region it's in without calling out to IMDS. At a glance it doesn't look like this would affect people installing via Helm or kustomize. This should be simple to fix, either:

  • Remove that environment variable from the container, IRSA adds it back anyway, although I guess it could break things for people not using IRSA?
  • Add the AWS_DEFAULT_REGION variable explicitly also.

Could possibly relate to:

  • Relates https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/1111#issuecomment-1999592464

Environment

  • Kubernetes version (use kubectl version):
kubectl version
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4-eks-a737599 
  • Driver version: v2.0.7-eksbuild.1

Please also attach debug logs to help us better diagnose

  • Instructions to gather debug logs can be found here

philnichol avatar Oct 15 '24 09:10 philnichol