aws-efs-csi-driver EFS Driver default permissions are WRONG

/kind bug

What happened? According to the official AWS Documentation for installing the EFS-CSI driver the default permissions are wrong.

What you expected to happen? That default permissions work out of the box

How to reproduce it (as minimally and precisely as possible)? Follow the steps there in the AWS documentation. Instead of disabling the creation of the serviceAccount for the helm deployment, create it with the following attributes (in terraform):

resource "helm_release" "aws_efs_csi_driver" {
repository = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
[...]
set {
    name  = "controller.serviceAccount.create"
    value = "true"
  }
  set {
    name  = "controller.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = "${module.efs_driver_role[0].iam_role_arn}"
    type  = "string"
  }
  set {
    name  = "controller.serviceAccount.name"
    value = "efs-csi-controller-sa"
  }
  [...]
  set {
    name  = "storageClasses[0].parameters.provisioningMode"
    value = "efs-ap"
  }
  set {
    name  = "storageClasses[0].parameters.basePath"
    value = "/dynamic_provisioning"
  }

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.2-eks-c12679a", GitCommit:"002c6bc4e142b1f48b9405140e559194a094dcff", GitTreeState:"clean", BuildDate:"2023-05-22T20:23:13Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}

Driver version: v1.5.7

Please also attach debug logs to help us better diagnose

The logs in my efs-controller pods were:

Normal   Provisioning          28s (x7 over 92s)  efs.csi.aws.com_xxx.compute.internal...  External provisioner is provisioning volume for claim "namespace/pvcname"
  Warning  ProvisioningFailed    28s (x7 over 92s)  efs.csi.aws.comxxx...  failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied
  Normal   ExternalProvisioning  1s (x9 over 92s)   persistentvolume-controller                                                                       waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

By changing the role permissions from here the access is denied. As a workaround you can allow all the elasticfilesystem:* permissions. But I want the default values to be working! We should change that default example.json policy.

Jul 06 '23 11:07 blueprismo

Sorry for the delayed response. I usually see that error, failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied, when the trust policy of the IAM Role doesn't allow the Service Account to assume it.

However, given that this problem went away when you modified your policy to allow all elasticfilesystem:* permissions, that might not be it.

Can you post your trust policy of the IAM Role, the definition of the IAM Policy that is tied to the Role, and the Kubernetes Manifest of the CSI Controller Service Account?

Additionally, we now have a new managed policy, AmazonEFSCSIDriverPolicy that you can apply to your IAM Role, and a managed add-on that is now available. Our EFS CSI Driver Page has updated documentation.

Jul 28 '23 13:07 RyanStan

Thanks a lot @RyanStan for your answer, worry not for the delays :) Sure thing, here's the trust policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::yyyy:oidc-provider/oidc.eks.region.amazonaws.com/id/xxxx"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.region.amazonaws.com/id/xxxx:aud": "sts.amazonaws.com",
                    "oidc.eks.region.amazonaws.com/id/xxxx:sub": "system:serviceaccount:kube-system:efs-csi-controller-sa"
                }
            }
        }
    ]
}

And I can attach the default permissions the driver creates (If you follow the official doc it will lead you to this README and as I use IRSA it corresponds to this exact same policy ):

This one is the default policy (not working out of the box, let's call it policy1)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:DescribeAccessPoints",
        "elasticfilesystem:DescribeFileSystems",
        "elasticfilesystem:DescribeMountTargets",
        "ec2:DescribeAvailabilityZones"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:CreateAccessPoint"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/efs.csi.aws.com/cluster": "true"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:TagResource"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "elasticfilesystem:DeleteAccessPoint",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
        }
      }
    }
  ]
}

I've extended the policy to add the following actions and now it's working (let's call this policy2):

{
    "Statement": [
        {
            "Action": [
                "elasticfilesystem:DescribeMountTargets",
                "elasticfilesystem:DescribeAccessPoints",
                "elasticfilesystem:DescribeFileSystems",
                "elasticfilesystem:ClientMount",
                "elasticfilesystem:ClientWrite",
                "elasticfilesystem:CreateTags",
                "elasticfilesystem:CreateMountTarget",
                "elasticfilesystem:DeleteMountTarget",
                "elasticfilesystem:DeleteTags",
                "elasticfilesystem:TagResource",
                "elasticfilesystem:UntagResource"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}

And the Kubernetes SA that consumes that is:

Name:                efs-csi-controller-sa
Namespace:           kube-system
Labels:              app.kubernetes.io/managed-by=Helm
                     app.kubernetes.io/name=aws-efs-csi-driver
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::accountID:role/EFSDriverRole
                     meta.helm.sh/release-name: aws-efs-csi-driver
                     meta.helm.sh/release-namespace: kube-system
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

I've already seen the AmazonEFSCSIDriverPolicy managed policy, also tried to use it. But no case (as it has the same as policy1).

I can provide anymore information or helping hand with your guidance.

Jul 31 '23 09:07 blueprismo

This is definitely odd, I'm unable to recreate it on my end. Can you follow our troubleshooting guide to

Enable debug logging for the csi driver
Enable debug logging for efs-utils
Recreate the failure
Run the log collector script on the Controller Pod with the error attached in your initial message.

Thanks for helping us debug this.

Aug 04 '23 16:08 RyanStan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 25 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 24 '24 21:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 25 '24 22:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 25 '24 22:03 k8s-ci-robot

aws-efs-csi-driver aws-efs-csi-driver copied to clipboard

EFS Driver default permissions are WRONG

aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard