aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

EFS Driver default permissions are WRONG

Open blueprismo opened this issue 2 years ago • 5 comments

/kind bug

What happened? According to the official AWS Documentation for installing the EFS-CSI driver the default permissions are wrong.

What you expected to happen? That default permissions work out of the box

How to reproduce it (as minimally and precisely as possible)? Follow the steps there in the AWS documentation. Instead of disabling the creation of the serviceAccount for the helm deployment, create it with the following attributes (in terraform):

resource "helm_release" "aws_efs_csi_driver" {
repository = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
[...]
set {
    name  = "controller.serviceAccount.create"
    value = "true"
  }
  set {
    name  = "controller.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = "${module.efs_driver_role[0].iam_role_arn}"
    type  = "string"
  }
  set {
    name  = "controller.serviceAccount.name"
    value = "efs-csi-controller-sa"
  }
  [...]
  set {
    name  = "storageClasses[0].parameters.provisioningMode"
    value = "efs-ap"
  }
  set {
    name  = "storageClasses[0].parameters.basePath"
    value = "/dynamic_provisioning"
  }
  

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.2-eks-c12679a", GitCommit:"002c6bc4e142b1f48b9405140e559194a094dcff", GitTreeState:"clean", BuildDate:"2023-05-22T20:23:13Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version: v1.5.7

Please also attach debug logs to help us better diagnose

The logs in my efs-controller pods were:

Normal   Provisioning          28s (x7 over 92s)  efs.csi.aws.com_xxx.compute.internal...  External provisioner is provisioning volume for claim "namespace/pvcname"
  Warning  ProvisioningFailed    28s (x7 over 92s)  efs.csi.aws.comxxx...  failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied
  Normal   ExternalProvisioning  1s (x9 over 92s)   persistentvolume-controller                                                                       waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

By changing the role permissions from here the access is denied. As a workaround you can allow all the elasticfilesystem:* permissions. But I want the default values to be working! We should change that default example.json policy.

blueprismo avatar Jul 06 '23 11:07 blueprismo

Sorry for the delayed response. I usually see that error, failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied, when the trust policy of the IAM Role doesn't allow the Service Account to assume it.

However, given that this problem went away when you modified your policy to allow all elasticfilesystem:* permissions, that might not be it.

Can you post your trust policy of the IAM Role, the definition of the IAM Policy that is tied to the Role, and the Kubernetes Manifest of the CSI Controller Service Account?

Additionally, we now have a new managed policy, AmazonEFSCSIDriverPolicy that you can apply to your IAM Role, and a managed add-on that is now available. Our EFS CSI Driver Page has updated documentation.

RyanStan avatar Jul 28 '23 13:07 RyanStan

Thanks a lot @RyanStan for your answer, worry not for the delays :) Sure thing, here's the trust policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::yyyy:oidc-provider/oidc.eks.region.amazonaws.com/id/xxxx"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.region.amazonaws.com/id/xxxx:aud": "sts.amazonaws.com",
                    "oidc.eks.region.amazonaws.com/id/xxxx:sub": "system:serviceaccount:kube-system:efs-csi-controller-sa"
                }
            }
        }
    ]
}

And I can attach the default permissions the driver creates (If you follow the official doc it will lead you to this README and as I use IRSA it corresponds to this exact same policy ):

This one is the default policy (not working out of the box, let's call it policy1)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:DescribeAccessPoints",
        "elasticfilesystem:DescribeFileSystems",
        "elasticfilesystem:DescribeMountTargets",
        "ec2:DescribeAvailabilityZones"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:CreateAccessPoint"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/efs.csi.aws.com/cluster": "true"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:TagResource"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "elasticfilesystem:DeleteAccessPoint",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
        }
      }
    }
  ]
}

I've extended the policy to add the following actions and now it's working (let's call this policy2):

{
    "Statement": [
        {
            "Action": [
                "elasticfilesystem:DescribeMountTargets",
                "elasticfilesystem:DescribeAccessPoints",
                "elasticfilesystem:DescribeFileSystems",
                "elasticfilesystem:ClientMount",
                "elasticfilesystem:ClientWrite",
                "elasticfilesystem:CreateTags",
                "elasticfilesystem:CreateMountTarget",
                "elasticfilesystem:DeleteMountTarget",
                "elasticfilesystem:DeleteTags",
                "elasticfilesystem:TagResource",
                "elasticfilesystem:UntagResource"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}

And the Kubernetes SA that consumes that is:

Name:                efs-csi-controller-sa
Namespace:           kube-system
Labels:              app.kubernetes.io/managed-by=Helm
                     app.kubernetes.io/name=aws-efs-csi-driver
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::accountID:role/EFSDriverRole
                     meta.helm.sh/release-name: aws-efs-csi-driver
                     meta.helm.sh/release-namespace: kube-system
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

I've already seen the AmazonEFSCSIDriverPolicy managed policy, also tried to use it. But no case (as it has the same as policy1).

I can provide anymore information or helping hand with your guidance.

blueprismo avatar Jul 31 '23 09:07 blueprismo

This is definitely odd, I'm unable to recreate it on my end. Can you follow our troubleshooting guide to

  1. Enable debug logging for the csi driver
  2. Enable debug logging for efs-utils
  3. Recreate the failure
  4. Run the log collector script on the Controller Pod with the error attached in your initial message.

Thanks for helping us debug this.

RyanStan avatar Aug 04 '23 16:08 RyanStan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 25 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 24 '24 21:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 25 '24 22:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 25 '24 22:03 k8s-ci-robot