aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Can't create a PVC using the "Dynamic Provisioning" example on EKS using IRSA in a private VPC.

Open jbehrends opened this issue 3 years ago • 2 comments

/kind bug

What happened? Followed the dynamic_provisioning example in this repo. The StorageClass creates fine, then after applying the PVC, the controller throws the following error:

I0610 22:10:36.013268       1 controller.go:60] CreateVolume: called with args {Name:pvc-12345-6789-abcd-aaaa-748932074 CapacityRange:required_bytes:5368709120  VolumeCapabilities:[mount:<> access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[csi.storage.k8s.io/pv/name:pvc-12345-6789-abcd-aaaa-748932074 csi.storage.k8s.io/pvc/name:efs-claim csi.storage.k8s.io/pvc/namespace:helloworld directoryPerms:700 fileSystemId:fs-12345abcdef gidRangeEnd:2000 gidRangeStart:1000 provisioningMode:efs-ap] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0610 22:10:36.013398       1 cloud.go:218] Calling DescribeFileSystems with input: {
  FileSystemId: "fs-12345abcdef"
}
I0610 22:10:36.164589       1 gid_allocator.go:52] Recieved getNextGid for fsId: fs-12345abcdef, min: 1000, max: 2000
I0610 22:10:36.164625       1 cloud.go:157] Calling Create AP with input: {
  ClientToken: "pvc-12345-6789-abcd-aaaa-748932074",
  FileSystemId: "fs-12345abcdef",
  PosixUser: {
    Gid: 1001,
    Uid: 1001
  },
  RootDirectory: {
    CreationInfo: {
      OwnerGid: 1001,
      OwnerUid: 1001,
      Permissions: "700"
    },
    Path: "/pvc-12345-6789-abcd-aaaa-748932074"
  },
  Tags: [{
      Key: "efs.csi.aws.com/cluster",
      Value: "true"
    }]
}
E0610 22:10:36.196344       1 driver.go:103] GRPC error: rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied

As you can see, the second line in the above log snippet shows the controller calling "elasticfilesystem:DescribeFileSystems" against the file system id, it gets a response back. Then it calls "elasticfilesystem:CreateAccessPoint" and gets an "Unauthenticated" error.

Cloudtrail shows the successful elasticfilesystem:DescribeFileSystems call from the role ARN of the iam role attached to the Kubernetes service account via IRSA. But the "elasticfilesystem:CreateAccessPoint" call does not show up at all in Cloudtrail. I can't find any entries at all. So it looks like me like maybe it's not even authenticating for this call?

Something else to note, this cluster is in a private VPC and requires the use the regional endpoints, so I have updated the deployment to use the env var "AWS_STS_REGIONAL_ENDPOINTS" and set it to "regional".

What you expected to happen? An PVC to be created along with the required EFS accesspoint and whatever else is needed.

Anything else we need to know?: -Used the IAM policy example that's included in this repo. -Used Helm Chart included in this repo for deployment -Deployed on EKS 1.21 and used IRSA to map the IAM role to the Kubernetes service account.

Environment

  • Kubernetes version (use kubectl version): v1.21.12-eks-a64ea69
  • Deployed using the helmchart that's included in this project.
  • Driver version: aws-efs-csi-driver:v1.3.8 node-driver-registrar:v2.5.0-eks-1-21-13 external-provisioner:v3.1.0-eks-1-21-13

jbehrends avatar Jun 10 '22 23:06 jbehrends

Hello, Same here, with both aws-efs-csi-driver v1.3.5 or v1.4.0. I have IRSA and a STS regional endpoint.

Curiously it was working fine until last week.

gregopenit avatar Jul 18 '22 12:07 gregopenit

@jbehrends, I found a workaround by changing the policy.

I am using the one provided by the documentation.

But for a reason I cannot explain yet (Cloudtrail only shows the call "elasticfilesystem:CreateAccessPoint"), I need to enlarge the policy by replacing the action: "elasticfilesystem:CreateAccessPoint" By "elasticfilesystem:*"

The boundary needs to be adapted as well if existing.

gregopenit avatar Jul 19 '22 13:07 gregopenit

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 27 '22 13:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 26 '22 13:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 26 '22 14:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 26 '22 14:12 k8s-ci-robot

This is unbearable... how come the default permissions are not working out of the box? AWS Documentation is so poor

blueprismo avatar Jul 06 '23 11:07 blueprismo

@gregopenit suggestion helped šŸ‘šŸ¼

But what is acutally going on there? I have more than 5 clusters working with the default setup and it's just fine...

AdamDomagalsky avatar Aug 04 '23 10:08 AdamDomagalsky

My DEV env it's working with: "elasticfilesystem:CreateAccessPoint" But my PRD env needs: "elasticfilesystem:*" And they look exactly the same. So, test it🌵 (tell me if u can justify this)

rui-armada avatar Aug 07 '23 16:08 rui-armada

For others running into this issue, please use the AmazonEFSCSIDriverPolicy policy, as this is managed by AWS and has all the permissions that are needed.

Here is a snippet of the policy, which is limited to just the permissions that are required for the CSI Driver to operate properly:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDescribe",
            "Effect": "Allow",
            "Action": [
                "elasticfilesystem:DescribeAccessPoints",
                "elasticfilesystem:DescribeFileSystems",
                "elasticfilesystem:DescribeMountTargets",
                "ec2:DescribeAvailabilityZones"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowCreateAccessPoint",
            "Effect": "Allow",
            "Action": [
                "elasticfilesystem:CreateAccessPoint"
            ],
            "Resource": "*",
            "Condition": {
                "Null": {
                    "aws:RequestTag/efs.csi.aws.com/cluster": "false"
                },
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": "efs.csi.aws.com/cluster"
                }
            }
        },
        {
            "Sid": "AllowTagNewAccessPoints",
            "Effect": "Allow",
            "Action": [
                "elasticfilesystem:TagResource"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "elasticfilesystem:CreateAction": "CreateAccessPoint"
                },
                "Null": {
                    "aws:RequestTag/efs.csi.aws.com/cluster": "false"
                },
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": "efs.csi.aws.com/cluster"
                }
            }
        },
        {
            "Sid": "AllowDeleteAccessPoint",
            "Effect": "Allow",
            "Action": "elasticfilesystem:DeleteAccessPoint",
            "Resource": "*",
            "Condition": {
                "Null": {
                    "aws:ResourceTag/efs.csi.aws.com/cluster": "false"
                }
            }
        }
    ]
}

Although I don't see the documentation referred to by the original comment, my intuition tells me it may have been missing elasticfilesystem:TagResource. Tags are part of the create access point request, so missing this permission may cause the create access point request to be denied, since the csi driver includes a tag when it creates access points.

RyanStan avatar Jan 25 '24 21:01 RyanStan