aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard
No permissions to create EBS volumes if the EBS controllers are installed in all the nodes in cluster
/kind feature
What happened?
Installed the EBS CSI driver as add-on to the AWS cluster, then tried creating the EBS volumes using statefulsets, but the statefulset pods failed to run with the below error:
could not create volume in EC2: UnauthorizedOperation: You are not authorized to perform this operation
What you expected to happen? The volume creation should have been successful using statefulset.
How to reproduce it (as minimally and precisely as possible)?
- Create AWS workload cluster.
- Install EBS CSI driver on workload cluster using CRD manifests.
- Create nginx service and related statefulset with volume claim and storage class. Here's the manifest I used:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: aws-ebs-volumes
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: xfs
type: io1
iopsPerGB: "100"
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- us-east-1a
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-statefulset
spec:
serviceName: "nginx-svc"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- name: nginx-web
containerPort: 80
volumeMounts:
- name: nginx-volumes
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-volumes
persistentVolumeClaim:
claimName: nginx-volumes
volumeClaimTemplates:
- metadata:
name: nginx-volumes
spec:
storageClassName: "aws-ebs-volumes"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 4Gi
Anything else we need to know?: We want the CSI controller deployment pinned to the control plane which should have the right permissions, the potential fix for this would be to add node affinity rules to the EBS controller deployment
serviceAccountName: ebs-csi-controller-sa
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
tolerationSeconds: 300
- key: node-role.kubernetes.io/master
effect: NoSchedule
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
Environment
- Kubernetes version (use
kubectl version
): v1.21.2 - Driver version: master branch
I can help fix this, if someone confirms the issue and the fix suggested
/assign @nirmalaagash
@Ankitasw Can you confirm the IAM policy that you have attached? Did you attach it to the service account IAM role or directly to the EC2 nodes where csi driver is installed? Because I do not any error in my cluster and I have my policy attached to the cluster nodes.
The cluster worker nodes should not have more permissions than they actually need to perform any operation, right? If i attach these policies to the worker nodes to create volume, then definitely it works, but we actually want EBS CSI driver controller to run in control plane nodes and not in worker nodes. cc @randomvariable
Hi @nirmalaagash
In https://github.com/kubernetes-sigs/cluster-api-provider-aws, we only provision control plane nodes with an IAM role with the EBS controller permissions, as giving worker nodes additional permissions for EBS snapshotting etc... may not be ideal. However, we're aware that not all Kubernetes clusters on AWS will be set up the same way (e.g. using IRSA), so I would propose the following, not as a bug, but as a feature request:
Preferentially run the EBS driver on control plane instances using preferredDuringSchedulingIgnoredDuringExecution rules as in the example in https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/8982b301e4b7c2aebfa97ea8b05cb8a49c046b9b/config/manager/manager.yaml#L41-L59
only for the EBS controller. We should do the same for external cloud provider as well. That then pretty much gives us parity with the old in-tree cloud provider in terms of behaviour.
/remove-kind bug /kind feature
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@nirmalaagash any updates on this?
@vdhanan @gtxu Can you take look into this?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
@vdhanan @gtxu any updates on this?
/lifecycle frozen
I noticed this error as well, assuming the aws-ebs-csi-driver EKS addon would work out of the box. But unlike other add-ons it does not.
After I added an assumable role and explicitly set that to the add-on, it works.
Don't forget to cycle the controller to have it pick up its updated service account: kubectl -n kube-system rollout restart deploy/ebs-csi-controller
Bit of relevant TF:
# This role-arn goes into the add-on config
module "iam_assumable_role_ebs_csi" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "4.7.0"
role_name = "format("%s-ebs-csi-controller-sa", var.cluster_name)"
provider_url = replace(module.eks_cluster.cluster_oidc_issuer_url, "https://", "")
role_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"]
oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
}
Docs: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
(Exploring things, it looks like ServiceAccounts of add-ons aws-node and kube-proxy don't have any role_arn
annotation, so I assume their permissions are obtained via default EKS roles on the nodes.)
After I added an assumable role and explicitly set that to the add-on, it works. Don't forget to cycle the controller to have it pick up its updated service account:
kubectl -n kube-system rollout restart deploy/ebs-csi-controller
My role was added fine by AWS, but it started working when I restarted ebs-csi-controller using your command above.
To make this work, just add to the EKS Cluster role
the permission to create volumes.
I had necessary permissions to create volumes, but i was struggling with the problem for a long time.
As @TBeijen and @Domantas said, i restarted the ebs controller with this command -> kubectl -n kube-system rollout restart deploy/ebs-csi-controller
and it worked like a charm.
I have the same issue. Cluster created with 1.21 now updated to 1.25 and I am having trouble with this.
I install the driver (via terraform registry.terraform.io/terraform-aws-modules/eks/aws)
cluster_addons = {
[..]
aws-ebs-csi-driver = {
most_recent = true
}
}
And it does not work. Restarting as mentioned above does not fix the issue.
I am not sure where to add the piece @TBeijen mentioned. Simply adding it "as-is" to my terraform does not change anything.
I did however find this reddit post: https://www.reddit.com/r/Terraform/comments/znomk4/ebs_csi_driver_entirely_from_terraform_on_aws_eks/
And adding the following to my node_groups config does fix the issue for me.
iam_role_additional_policies = {
AmazonEBSCSIDriverPolicy = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}
But I'm not sure if this is the preferred way?
@Mattie112 The AmazonEBSCSIDriverPolicy
is an AWS managed policy specifically designed to provide the required permissions for the EBS CSI driver and it's safe to use at your convenience. You may also choose to use your own policy if you have more granular permission requirements.
Allright so that is good. But: This currently adds it to the entire node. Wouldn't it be better to give the permissions to the SA?
@Mattie112 Absolutely, IRSA is a recommended way to manage AWS permissions on k8s, it allows for ensuring that your services only have the access they need and nothing more.
- EKS Security Guide: IAM roles for service accounts
- EKS Best Practices Guides - Identity and Access Management
I now have the following (and this works fine):
module "eks" {
source = "registry.terraform.io/terraform-aws-modules/eks/aws"
version = "19.15.1"
[...]
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
}
[...]
data "aws_iam_policy" "ebs_csi_policy" {
arn = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}
module "ebs_csi_irsa_role" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
role_name = "ebs-csi"
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
}
This is a bit of a combination of https://blog.devops.dev/how-to-provision-your-eks-cluster-using-terraform-7b086f9a9848?gi=1fb9dbcba990 and https://registry.terraform.io/modules/terraform-aws-modules/iam/aws/latest/submodules/iam-role-for-service-accounts-eks
(Also had a reddit discussion about this, leaving it here for traceability: https://www.reddit.com/r/kubernetes/comments/1419cfw/pvpvc_not_working_after_k8s_upgrade_to_125/)
/close
The requirement to supply permissions to the EBS CSI Driver is documented in the installation docs (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/install.md#set-up-driver-permissions).
Although instance roles for EC2 will work (when IMDS is configured to allow pod access), use of IRSA or similar is highly recommended, to avoid providing permissions to all pods.
Nowadays, controller affinity can be configured in the Helm chart if desired (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L154).
@ConnorJC3: Closing this issue.
In response to this:
/close
The requirement to supply permissions to the EBS CSI Driver is documented in the installation docs (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/install.md#set-up-driver-permissions).
Although instance roles for EC2 will work (when IMDS is configured to allow pod access), use of IRSA or similar is highly recommended, to avoid providing permissions to all pods.
Nowadays, controller affinity can be configured in the Helm chart if desired (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L154).
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.