cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Specefic role and instance profile for eks worker nodes.

Open faiq opened this issue 3 years ago • 4 comments

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.]

As a EKS cluster operator, I'd like to have a specific set of roles for my AWS worker nodes use.

Currently the role AWSIAMRoleNodes is a shared role for workers that are for regular clusters and EKS clusters.

  AWSIAMRoleNodes:
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Action:
          - sts:AssumeRole
          Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
        Version: 2012-10-17
      ManagedPolicyArns:
      - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
      RoleName: nodes.cluster-api-provider-aws.sigs.k8s.io
    Type: AWS::IAM::Role

My proposal is to create a new Instance Profile and Role specific for nodes that are part of an EKS cluster and remove the ManagedPolicyArns in the role pasted above

This would make the AWSIAMRoleNodes to be the following

  AWSIAMRoleNodes:
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Action:
          - sts:AssumeRole
          Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
        Version: 2012-10-17
      RoleName: nodes.cluster-api-provider-aws.sigs.k8s.io
    Type: AWS::IAM::Role

and a new role called eks-nodes.cluster-api-provider-aws.sigs.k8s.io which would like this

  AWSIAMRoleEKSNodes:
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Action:
          - sts:AssumeRole
          Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
        Version: 2012-10-17
      ManagedPolicyArns:
      - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
      RoleName: eks-nodes.cluster-api-provider-aws.sigs.k8s.io

Along with a new InstanceProfile to use that Role as follows

 AWSIAMInstanceProfileEKSNodes:
   Properties:
     InstanceProfileName: eks-nodes.cluster-api-provider-aws.sigs.k8s.io
     Roles:
     - Ref: AWSIAMRoleEKSNodes
   Type: AWS::IAM::InstanceProfile

I think this will warrant code changes in clusterawsadm as well as the tests to use this newly created role+instance profile https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/eb5da5870f9147624430de1b67e55843991ed7d0/test/e2e/data/eks/cluster-template-eks-machine-deployment-only.yaml#L32

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

faiq avatar Jun 24 '22 18:06 faiq

@faiq: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 24 '22 18:06 k8s-ci-robot

In looking at this we would need to modify this to use the new role as well https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/93897be636031fac812765d95a018fe61dbd689f/pkg/cloud/services/iamauth/reconcile.go#L51

faiq avatar Jul 01 '22 16:07 faiq

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 29 '22 17:09 k8s-triage-robot

/remove-lifecycle stale

faiq avatar Oct 03 '22 15:10 faiq

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 01 '23 15:01 k8s-triage-robot

From office hours 01/01/23: Makes sense to scope down permissions of non-EKS clusters.

/triage accepted /priority important-longterm /help

dlipovetsky avatar Jan 23 '23 17:01 dlipovetsky

@dlipovetsky: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

From office hours 01/01/23: Makes sense to scope down permissions of non-EKS clusters.

/triage accepted /priority important-longterm /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 23 '23 17:01 k8s-ci-robot

The hard coded instance profile of cluster-template-eks-machine-deployment-only.yaml introduces several issues:

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/eb5da5870f9147624430de1b67e55843991ed7d0/test/e2e/data/eks/cluster-template-eks-machine-deployment-only.yaml#L32

  • when multiple capi versions or deployments (eks and none eks) are in the same account
  • as IAM is a global service clusters in different regions must share the same permissions
  • users cannot separate permissions from EKS and non-EKS clusters
  • issues with simple naming rules for instance-profiles in some environments.

Like with non-EKS clusters instance profile should not be hard coded but rely on user input so the user is in charge and control of the instances profiles being assigned to different clusters

fatz avatar Jan 24 '23 19:01 fatz

In looking at this we would need to modify this to use the new role as well

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/93897be636031fac812765d95a018fe61dbd689f/pkg/cloud/services/iamauth/reconcile.go#L51

I think there are 2 issues here:

  1. The resource identifier "prefix" (arn::aws::iam) is hard-coded, but the prefix is not global, and does not work in some regions. (There was recent work to address this in https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3926, but this work had to be reverted in https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3982)
  2. The "role name" in the resource identifier is derived from a default. User are allowed to define their own role name, but if they do, this code does not use that name. (For comparison, see this example, showing code that respects the role name defined by the user).

dlipovetsky avatar Jan 24 '23 19:01 dlipovetsky

Thanks @dlipovetsky I was accidentally referencing on a test. Thank you for clarifying this.

fatz avatar Jan 24 '23 20:01 fatz

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Jan 24 '24 21:01 k8s-triage-robot