cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Specefic role and instance profile for eks worker nodes.
/kind feature
Describe the solution you'd like [A clear and concise description of what you want to happen.]
As a EKS cluster operator, I'd like to have a specific set of roles for my AWS worker nodes use.
Currently the role AWSIAMRoleNodes is a shared role for workers that are for regular clusters and EKS clusters.
AWSIAMRoleNodes:
Properties:
AssumeRolePolicyDocument:
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Version: 2012-10-17
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
RoleName: nodes.cluster-api-provider-aws.sigs.k8s.io
Type: AWS::IAM::Role
My proposal is to create a new Instance Profile and Role specific for nodes that are part of an EKS cluster and remove the ManagedPolicyArns in the role pasted above
This would make the AWSIAMRoleNodes to be the following
AWSIAMRoleNodes:
Properties:
AssumeRolePolicyDocument:
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Version: 2012-10-17
RoleName: nodes.cluster-api-provider-aws.sigs.k8s.io
Type: AWS::IAM::Role
and a new role called eks-nodes.cluster-api-provider-aws.sigs.k8s.io which would like this
AWSIAMRoleEKSNodes:
Properties:
AssumeRolePolicyDocument:
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Version: 2012-10-17
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
RoleName: eks-nodes.cluster-api-provider-aws.sigs.k8s.io
Along with a new InstanceProfile to use that Role as follows
AWSIAMInstanceProfileEKSNodes:
Properties:
InstanceProfileName: eks-nodes.cluster-api-provider-aws.sigs.k8s.io
Roles:
- Ref: AWSIAMRoleEKSNodes
Type: AWS::IAM::InstanceProfile
I think this will warrant code changes in clusterawsadm as well as the tests to use this newly created role+instance profile https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/eb5da5870f9147624430de1b67e55843991ed7d0/test/e2e/data/eks/cluster-template-eks-machine-deployment-only.yaml#L32
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Cluster-api-provider-aws version:
- Kubernetes version: (use
kubectl version): - OS (e.g. from
/etc/os-release):
@faiq: This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
In looking at this we would need to modify this to use the new role as well https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/93897be636031fac812765d95a018fe61dbd689f/pkg/cloud/services/iamauth/reconcile.go#L51
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
From office hours 01/01/23: Makes sense to scope down permissions of non-EKS clusters.
/triage accepted /priority important-longterm /help
@dlipovetsky: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
From office hours 01/01/23: Makes sense to scope down permissions of non-EKS clusters.
/triage accepted /priority important-longterm /help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The hard coded instance profile of cluster-template-eks-machine-deployment-only.yaml introduces several issues:
https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/eb5da5870f9147624430de1b67e55843991ed7d0/test/e2e/data/eks/cluster-template-eks-machine-deployment-only.yaml#L32
- when multiple capi versions or deployments (eks and none eks) are in the same account
- as IAM is a global service clusters in different regions must share the same permissions
- users cannot separate permissions from EKS and non-EKS clusters
- issues with simple naming rules for instance-profiles in some environments.
Like with non-EKS clusters instance profile should not be hard coded but rely on user input so the user is in charge and control of the instances profiles being assigned to different clusters
In looking at this we would need to modify this to use the new role as well
https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/93897be636031fac812765d95a018fe61dbd689f/pkg/cloud/services/iamauth/reconcile.go#L51
I think there are 2 issues here:
- The resource identifier "prefix" (
arn::aws::iam) is hard-coded, but the prefix is not global, and does not work in some regions. (There was recent work to address this in https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3926, but this work had to be reverted in https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3982) - The "role name" in the resource identifier is derived from a default. User are allowed to define their own role name, but if they do, this code does not use that name. (For comparison, see this example, showing code that respects the role name defined by the user).
Thanks @dlipovetsky I was accidentally referencing on a test. Thank you for clarifying this.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted