aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard
EFS Driver default permissions are WRONG
/kind bug
What happened? According to the official AWS Documentation for installing the EFS-CSI driver the default permissions are wrong.
What you expected to happen? That default permissions work out of the box
How to reproduce it (as minimally and precisely as possible)? Follow the steps there in the AWS documentation. Instead of disabling the creation of the serviceAccount for the helm deployment, create it with the following attributes (in terraform):
resource "helm_release" "aws_efs_csi_driver" {
repository = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
[...]
set {
name = "controller.serviceAccount.create"
value = "true"
}
set {
name = "controller.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = "${module.efs_driver_role[0].iam_role_arn}"
type = "string"
}
set {
name = "controller.serviceAccount.name"
value = "efs-csi-controller-sa"
}
[...]
set {
name = "storageClasses[0].parameters.provisioningMode"
value = "efs-ap"
}
set {
name = "storageClasses[0].parameters.basePath"
value = "/dynamic_provisioning"
}
Anything else we need to know?:
Environment
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.2-eks-c12679a", GitCommit:"002c6bc4e142b1f48b9405140e559194a094dcff", GitTreeState:"clean", BuildDate:"2023-05-22T20:23:13Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
- Driver version: v1.5.7
Please also attach debug logs to help us better diagnose
The logs in my efs-controller pods were:
Normal Provisioning 28s (x7 over 92s) efs.csi.aws.com_xxx.compute.internal... External provisioner is provisioning volume for claim "namespace/pvcname"
Warning ProvisioningFailed 28s (x7 over 92s) efs.csi.aws.comxxx... failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied
Normal ExternalProvisioning 1s (x9 over 92s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator
By changing the role permissions from here the access is denied.
As a workaround you can allow all the elasticfilesystem:* permissions. But I want the default values to be working! We should change that default example.json policy.
Sorry for the delayed response. I usually see that error, failed to provision volume with StorageClass "efs-sc": rpc error: code = Unauthenticated desc = Access Denied. Please ensure you have the right AWS permissions: Access denied, when the trust policy of the IAM Role doesn't allow the Service Account to assume it.
However, given that this problem went away when you modified your policy to allow all elasticfilesystem:* permissions, that might not be it.
Can you post your trust policy of the IAM Role, the definition of the IAM Policy that is tied to the Role, and the Kubernetes Manifest of the CSI Controller Service Account?
Additionally, we now have a new managed policy, AmazonEFSCSIDriverPolicy that you can apply to your IAM Role, and a managed add-on that is now available. Our EFS CSI Driver Page has updated documentation.
Thanks a lot @RyanStan for your answer, worry not for the delays :) Sure thing, here's the trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::yyyy:oidc-provider/oidc.eks.region.amazonaws.com/id/xxxx"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.region.amazonaws.com/id/xxxx:aud": "sts.amazonaws.com",
"oidc.eks.region.amazonaws.com/id/xxxx:sub": "system:serviceaccount:kube-system:efs-csi-controller-sa"
}
}
}
]
}
And I can attach the default permissions the driver creates (If you follow the official doc it will lead you to this README and as I use IRSA it corresponds to this exact same policy ):
This one is the default policy (not working out of the box, let's call it policy1)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeAccessPoints",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:DescribeMountTargets",
"ec2:DescribeAvailabilityZones"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateAccessPoint"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:RequestTag/efs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:TagResource"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:ResourceTag/efs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": "elasticfilesystem:DeleteAccessPoint",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/efs.csi.aws.com/cluster": "true"
}
}
}
]
}
I've extended the policy to add the following actions and now it's working (let's call this policy2):
{
"Statement": [
{
"Action": [
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeAccessPoints",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientWrite",
"elasticfilesystem:CreateTags",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DeleteTags",
"elasticfilesystem:TagResource",
"elasticfilesystem:UntagResource"
],
"Effect": "Allow",
"Resource": "*"
}
],
"Version": "2012-10-17"
}
And the Kubernetes SA that consumes that is:
Name: efs-csi-controller-sa
Namespace: kube-system
Labels: app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-efs-csi-driver
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::accountID:role/EFSDriverRole
meta.helm.sh/release-name: aws-efs-csi-driver
meta.helm.sh/release-namespace: kube-system
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
I've already seen the AmazonEFSCSIDriverPolicy managed policy, also tried to use it. But no case (as it has the same as policy1).
I can provide anymore information or helping hand with your guidance.
This is definitely odd, I'm unable to recreate it on my end. Can you follow our troubleshooting guide to
- Enable debug logging for the csi driver
- Enable debug logging for efs-utils
- Recreate the failure
- Run the log collector script on the Controller Pod with the error attached in your initial message.
Thanks for helping us debug this.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.