aws-ebs-csi-driver icon indicating copy to clipboard operation
aws-ebs-csi-driver copied to clipboard

No permissions to create EBS volumes if the EBS controllers are installed in all the nodes in cluster

Open Ankitasw opened this issue 3 years ago • 14 comments

/kind feature

What happened? Installed the EBS CSI driver as add-on to the AWS cluster, then tried creating the EBS volumes using statefulsets, but the statefulset pods failed to run with the below error: could not create volume in EC2: UnauthorizedOperation: You are not authorized to perform this operation

What you expected to happen? The volume creation should have been successful using statefulset.

How to reproduce it (as minimally and precisely as possible)?

  1. Create AWS workload cluster.
  2. Install EBS CSI driver on workload cluster using CRD manifests.
  3. Create nginx service and related statefulset with volume claim and storage class. Here's the manifest I used:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: aws-ebs-volumes
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  csi.storage.k8s.io/fstype: xfs
  type: io1
  iopsPerGB: "100"
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.ebs.csi.aws.com/zone
        values:
          - us-east-1a
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx-statefulset
spec:
  serviceName: "nginx-svc"
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: k8s.gcr.io/nginx-slim:0.8
          ports:
            - name: nginx-web
              containerPort: 80
          volumeMounts:
            - name: nginx-volumes
              mountPath: /usr/share/nginx/html
      volumes:
        - name: nginx-volumes
          persistentVolumeClaim:
            claimName: nginx-volumes
  volumeClaimTemplates:
    - metadata:
        name: nginx-volumes
      spec:
        storageClassName: "aws-ebs-volumes"
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 4Gi

Anything else we need to know?: We want the CSI controller deployment pinned to the control plane which should have the right permissions, the potential fix for this would be to add node affinity rules to the EBS controller deployment

serviceAccountName: ebs-csi-controller-sa
tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
        tolerationSeconds: 300
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
            - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists

Environment

  • Kubernetes version (use kubectl version): v1.21.2
  • Driver version: master branch

Ankitasw avatar Aug 27 '21 13:08 Ankitasw

I can help fix this, if someone confirms the issue and the fix suggested

Ankitasw avatar Aug 30 '21 13:08 Ankitasw

/assign @nirmalaagash

nirmalaagash avatar Aug 30 '21 18:08 nirmalaagash

@Ankitasw Can you confirm the IAM policy that you have attached? Did you attach it to the service account IAM role or directly to the EC2 nodes where csi driver is installed? Because I do not any error in my cluster and I have my policy attached to the cluster nodes.

nirmalaagash avatar Aug 30 '21 22:08 nirmalaagash

The cluster worker nodes should not have more permissions than they actually need to perform any operation, right? If i attach these policies to the worker nodes to create volume, then definitely it works, but we actually want EBS CSI driver controller to run in control plane nodes and not in worker nodes. cc @randomvariable

Ankitasw avatar Aug 31 '21 05:08 Ankitasw

Hi @nirmalaagash

In https://github.com/kubernetes-sigs/cluster-api-provider-aws, we only provision control plane nodes with an IAM role with the EBS controller permissions, as giving worker nodes additional permissions for EBS snapshotting etc... may not be ideal. However, we're aware that not all Kubernetes clusters on AWS will be set up the same way (e.g. using IRSA), so I would propose the following, not as a bug, but as a feature request:

Preferentially run the EBS driver on control plane instances using preferredDuringSchedulingIgnoredDuringExecution rules as in the example in https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/8982b301e4b7c2aebfa97ea8b05cb8a49c046b9b/config/manager/manager.yaml#L41-L59

only for the EBS controller. We should do the same for external cloud provider as well. That then pretty much gives us parity with the old in-tree cloud provider in terms of behaviour.

randomvariable avatar Aug 31 '21 10:08 randomvariable

/remove-kind bug /kind feature

nirmalaagash avatar Aug 31 '21 21:08 nirmalaagash

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 29 '21 22:11 k8s-triage-robot

@nirmalaagash any updates on this?

Ankitasw avatar Dec 10 '21 14:12 Ankitasw

@vdhanan @gtxu Can you take look into this?

nirmalaagash avatar Dec 15 '21 02:12 nirmalaagash

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 14 '22 02:01 k8s-triage-robot

@vdhanan @gtxu any updates on this?

Ankitasw avatar Jan 23 '22 08:01 Ankitasw

/lifecycle frozen

Ankitasw avatar Feb 22 '22 06:02 Ankitasw

I noticed this error as well, assuming the aws-ebs-csi-driver EKS addon would work out of the box. But unlike other add-ons it does not.

After I added an assumable role and explicitly set that to the add-on, it works. Don't forget to cycle the controller to have it pick up its updated service account: kubectl -n kube-system rollout restart deploy/ebs-csi-controller

Bit of relevant TF:

# This role-arn goes into the add-on config
module "iam_assumable_role_ebs_csi" {
  source                        = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                       = "4.7.0"
  role_name                     = "format("%s-ebs-csi-controller-sa", var.cluster_name)"
  provider_url                  = replace(module.eks_cluster.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns              = ["arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
}

Docs: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html

(Exploring things, it looks like ServiceAccounts of add-ons aws-node and kube-proxy don't have any role_arn annotation, so I assume their permissions are obtained via default EKS roles on the nodes.)

TBeijen avatar Jul 29 '22 11:07 TBeijen

After I added an assumable role and explicitly set that to the add-on, it works. Don't forget to cycle the controller to have it pick up its updated service account: kubectl -n kube-system rollout restart deploy/ebs-csi-controller

My role was added fine by AWS, but it started working when I restarted ebs-csi-controller using your command above.

Domantas avatar Aug 07 '22 07:08 Domantas

To make this work, just add to the EKS Cluster role the permission to create volumes.

OneideLuizSchneider avatar Feb 24 '23 20:02 OneideLuizSchneider

I had necessary permissions to create volumes, but i was struggling with the problem for a long time. As @TBeijen and @Domantas said, i restarted the ebs controller with this command -> kubectl -n kube-system rollout restart deploy/ebs-csi-controller and it worked like a charm.

bireycloudoki avatar May 16 '23 13:05 bireycloudoki

I have the same issue. Cluster created with 1.21 now updated to 1.25 and I am having trouble with this.

I install the driver (via terraform registry.terraform.io/terraform-aws-modules/eks/aws)

  cluster_addons = {
[..]
    aws-ebs-csi-driver = {
      most_recent = true
    }
  }

And it does not work. Restarting as mentioned above does not fix the issue.

I am not sure where to add the piece @TBeijen mentioned. Simply adding it "as-is" to my terraform does not change anything.

I did however find this reddit post: https://www.reddit.com/r/Terraform/comments/znomk4/ebs_csi_driver_entirely_from_terraform_on_aws_eks/

And adding the following to my node_groups config does fix the issue for me.

      iam_role_additional_policies = {
        AmazonEBSCSIDriverPolicy = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
      }

But I'm not sure if this is the preferred way?

Mattie112 avatar Jun 06 '23 11:06 Mattie112

@Mattie112 The AmazonEBSCSIDriverPolicy is an AWS managed policy specifically designed to provide the required permissions for the EBS CSI driver and it's safe to use at your convenience. You may also choose to use your own policy if you have more granular permission requirements.

torredil avatar Jun 06 '23 14:06 torredil

Allright so that is good. But: This currently adds it to the entire node. Wouldn't it be better to give the permissions to the SA?

Mattie112 avatar Jun 06 '23 14:06 Mattie112

@Mattie112 Absolutely, IRSA is a recommended way to manage AWS permissions on k8s, it allows for ensuring that your services only have the access they need and nothing more.

torredil avatar Jun 06 '23 15:06 torredil

I now have the following (and this works fine):

module "eks" {
source                    = "registry.terraform.io/terraform-aws-modules/eks/aws"
version                   = "19.15.1"
[...]
  aws-ebs-csi-driver = {
    most_recent = true
    service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
  }
[...]

data "aws_iam_policy" "ebs_csi_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}

module "ebs_csi_irsa_role" {
  source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"

  role_name             = "ebs-csi"
  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

This is a bit of a combination of https://blog.devops.dev/how-to-provision-your-eks-cluster-using-terraform-7b086f9a9848?gi=1fb9dbcba990 and https://registry.terraform.io/modules/terraform-aws-modules/iam/aws/latest/submodules/iam-role-for-service-accounts-eks

(Also had a reddit discussion about this, leaving it here for traceability: https://www.reddit.com/r/kubernetes/comments/1419cfw/pvpvc_not_working_after_k8s_upgrade_to_125/)

Mattie112 avatar Jun 08 '23 11:06 Mattie112

/close

The requirement to supply permissions to the EBS CSI Driver is documented in the installation docs (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/install.md#set-up-driver-permissions).

Although instance roles for EC2 will work (when IMDS is configured to allow pod access), use of IRSA or similar is highly recommended, to avoid providing permissions to all pods.

Nowadays, controller affinity can be configured in the Helm chart if desired (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L154).

ConnorJC3 avatar Jul 03 '23 18:07 ConnorJC3

@ConnorJC3: Closing this issue.

In response to this:

/close

The requirement to supply permissions to the EBS CSI Driver is documented in the installation docs (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/install.md#set-up-driver-permissions).

Although instance roles for EC2 will work (when IMDS is configured to allow pod access), use of IRSA or similar is highly recommended, to avoid providing permissions to all pods.

Nowadays, controller affinity can be configured in the Helm chart if desired (https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L154).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 03 '23 18:07 k8s-ci-robot