karpenter-provider-aws icon indicating copy to clipboard operation
karpenter-provider-aws copied to clipboard

Suggestion to split up or reduce Controller IAM Policy into (multiple) smaller Policies

Open donovanmuller opened this issue 1 year ago • 11 comments

Description

What problem are you trying to solve?

As raised and discussed in https://github.com/terraform-aws-modules/terraform-aws-eks/issues/3319, we hit the PolicySize quota on the Controller IAM Policy for EKS clusters that have long names. E.g. afs1-xxxxxxxx-xxx-xxxxxxxxxxxxx-xxx-xxxxx

It would be appreciated if the IAM Policy could either be reduced in size, or split up into multiple smaller Policies.

How important is this feature to you?

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

donovanmuller avatar Mar 11 '25 14:03 donovanmuller

Hi @edibble21 , Can I work on this? Thanks

gnana997 avatar Mar 22 '25 16:03 gnana997

Since this policy was last extended we've been having trouble with even moderately long cluster names in regions with long names using the terraform-aws-eks module (as it copies this policy 1:1)

For example, in the region ap-southeast-2, a cluster with just 28 characters in length (example-production-au-string) ends up with a policy of length 6139, just 5 characters off the limit of 6144.

We've got some clusters with slightly longer names than this so the length of the policy is causing us some issues taking the latest version of the policy.

Here's the above example of the policy from the aws-eks module (which is 1:1 with the policy recommended here)

{"Statement":[{"Action":["ec2:RunInstances","ec2:CreateFleet"],"Effect":"Allow","Resource":["arn:aws:ec2:ap-southeast-2::snapshot/*","arn:aws:ec2:ap-southeast-2::image/*","arn:aws:ec2:ap-southeast-2:*:subnet/*","arn:aws:ec2:ap-southeast-2:*:security-group/*","arn:aws:ec2:ap-southeast-2:*:capacity-reservation/*"],"Sid":"AllowScopedEC2InstanceAccessActions"},{"Action":["ec2:RunInstances","ec2:CreateFleet"],"Condition":{"StringEquals":{"aws:ResourceTag/kubernetes.io/cluster/example-production-au-string":"owned"},"StringLike":{"aws:ResourceTag/karpenter.sh/nodepool":"*"}},"Effect":"Allow","Resource":"arn:aws:ec2:ap-southeast-2:*:launch-template/*","Sid":"AllowScopedEC2LaunchTemplateAccessActions"},{"Action":["ec2:RunInstances","ec2:CreateLaunchTemplate","ec2:CreateFleet"],"Condition":{"StringEquals":{"aws:RequestTag/eks:eks-cluster-name":"example-production-au-string","aws:RequestTag/kubernetes.io/cluster/example-production-au-string":"owned"},"StringLike":{"aws:RequestTag/karpenter.sh/nodepool":"*"}},"Effect":"Allow","Resource":["arn:aws:ec2:ap-southeast-2:*:volume/*","arn:aws:ec2:ap-southeast-2:*:spot-instances-request/*","arn:aws:ec2:ap-southeast-2:*:network-interface/*","arn:aws:ec2:ap-southeast-2:*:launch-template/*","arn:aws:ec2:ap-southeast-2:*:instance/*","arn:aws:ec2:ap-southeast-2:*:fleet/*","arn:aws:ec2:ap-southeast-2:*:capacity-reservation/*"],"Sid":"AllowScopedEC2InstanceActionsWithTags"},{"Action":"ec2:CreateTags","Condition":{"StringEquals":{"aws:RequestTag/eks:eks-cluster-name":"example-production-au-string","aws:RequestTag/kubernetes.io/cluster/example-production-au-string":"owned","ec2:CreateAction":["RunInstances","CreateFleet","CreateLaunchTemplate"]},"StringLike":{"aws:RequestTag/karpenter.sh/nodepool":"*"}},"Effect":"Allow","Resource":["arn:aws:ec2:ap-southeast-2:*:volume/*","arn:aws:ec2:ap-southeast-2:*:spot-instances-request/*","arn:aws:ec2:ap-southeast-2:*:network-interface/*","arn:aws:ec2:ap-southeast-2:*:launch-template/*","arn:aws:ec2:ap-southeast-2:*:instance/*","arn:aws:ec2:ap-southeast-2:*:fleet/*"],"Sid":"AllowScopedResourceCreationTagging"},{"Action":"ec2:CreateTags","Condition":{"ForAllValues:StringEquals":{"aws:TagKeys":["eks:eks-cluster-name","karpenter.sh/nodeclaim","Name"]},"StringEquals":{"aws:ResourceTag/kubernetes.io/cluster/example-production-au-string":"owned"},"StringEqualsIfExists":{"aws:RequestTag/eks:eks-cluster-name":"example-production-au-string"},"StringLike":{"aws:ResourceTag/karpenter.sh/nodepool":"*"}},"Effect":"Allow","Resource":"arn:aws:ec2:ap-southeast-2:*:instance/*","Sid":"AllowScopedResourceTagging"},{"Action":["ec2:TerminateInstances","ec2:DeleteLaunchTemplate"],"Condition":{"StringEquals":{"aws:ResourceTag/kubernetes.io/cluster/example-production-au-string":"owned"},"StringLike":{"aws:ResourceTag/karpenter.sh/nodepool":"*"}},"Effect":"Allow","Resource":["arn:aws:ec2:ap-southeast-2:*:launch-template/*","arn:aws:ec2:ap-southeast-2:*:instance/*"],"Sid":"AllowScopedDeletion"},{"Action":["ec2:DescribeSubnets","ec2:DescribeSpotPriceHistory","ec2:DescribeSecurityGroups","ec2:DescribeLaunchTemplates","ec2:DescribeInstances","ec2:DescribeInstanceTypes","ec2:DescribeInstanceTypeOfferings","ec2:DescribeImages","ec2:DescribeAvailabilityZones"],"Condition":{"StringEquals":{"aws:RequestedRegion":"ap-southeast-2"}},"Effect":"Allow","Resource":"*","Sid":"AllowRegionalReadActions"},{"Action":"ssm:GetParameter","Effect":"Allow","Resource":"arn:aws:ssm:ap-southeast-2::parameter/aws/service/*","Sid":"AllowSSMReadActions"},{"Action":"pricing:GetProducts","Effect":"Allow","Resource":"*","Sid":"AllowPricingReadActions"},{"Action":["sqs:ReceiveMessage","sqs:GetQueueUrl","sqs:DeleteMessage"],"Effect":"Allow","Resource":"arn:aws:sqs:ap-southeast-2:660251268984:Karpenter-example-production-au-string","Sid":"AllowInterruptionQueueActions"},{"Action":"iam:PassRole","Condition":{"StringEquals":{"iam:PassedToService":"ec2.amazonaws.com"}},"Effect":"Allow","Resource":"arn:aws:iam::660251268984:role/example-production-au-string-eks-worker","Sid":"AllowPassingInstanceRole"},{"Action":"iam:CreateInstanceProfile","Condition":{"StringEquals":{"aws:RequestTag/eks:eks-cluster-name":"example-production-au-string","aws:RequestTag/kubernetes.io/cluster/example-production-au-string":"owned","aws:RequestTag/topology.kubernetes.io/region":"ap-southeast-2"},"StringLike":{"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass":"*"}},"Effect":"Allow","Resource":"arn:aws:iam::660251268984:instance-profile/*","Sid":"AllowScopedInstanceProfileCreationActions"},{"Action":"iam:TagInstanceProfile","Condition":{"StringEquals":{"aws:RequestTag/eks:eks-cluster-name":"example-production-au-string","aws:RequestTag/kubernetes.io/cluster/example-production-au-string":"owned","aws:RequestTag/topology.kubernetes.io/region":"ap-southeast-2","aws:ResourceTag/kubernetes.io/cluster/example-production-au-string":"owned","aws:ResourceTag/topology.kubernetes.io/region":"ap-southeast-2"},"StringLike":{"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass":"*","aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass":"*"}},"Effect":"Allow","Resource":"arn:aws:iam::660251268984:instance-profile/*","Sid":"AllowScopedInstanceProfileTagActions"},{"Action":["iam:RemoveRoleFromInstanceProfile","iam:DeleteInstanceProfile","iam:AddRoleToInstanceProfile"],"Condition":{"StringEquals":{"aws:ResourceTag/kubernetes.io/cluster/example-production-au-string":"owned","aws:ResourceTag/topology.kubernetes.io/region":"ap-southeast-2"},"StringLike":{"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass":"*"}},"Effect":"Allow","Resource":"arn:aws:iam::660251268984:instance-profile/*","Sid":"AllowScopedInstanceProfileActions"},{"Action":"iam:GetInstanceProfile","Effect":"Allow","Resource":"arn:aws:iam::660251268984:instance-profile/*","Sid":"AllowInstanceProfileReadActions"},{"Action":"iam:ListInstanceProfiles","Effect":"Allow","Resource":"*","Sid":"AllowUnscopedInstanceProfileListAction"},{"Action":"eks:DescribeCluster","Effect":"Allow","Resource":"arn:aws:eks:ap-southeast-2:660251268984:cluster/example-production-au-string","Sid":"AllowAPIServerEndpointDiscovery"}],"Version":"2012-10-17"}

iress-ac avatar Sep 15 '25 14:09 iress-ac

Which policy does the eks module copy 1:1?

sarkis avatar Sep 15 '25 22:09 sarkis

The policy and permissions defined here https://github.com/aws/karpenter-provider-aws/blob/main/website/content/en/v1.6/getting-started/getting-started-with-karpenter/cloudformation.yaml

bryantbiggs avatar Sep 15 '25 23:09 bryantbiggs

Hope this is getting looked into, with 1.7.1 the policy is now too large to support even relatively short cluster names. https://raw.githubusercontent.com/aws/karpenter-provider-aws/v1.7.1/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml New Section:

{
            "Sid": "AllowUnscopedInstanceProfileListAction",
            "Effect": "Allow",
            "Resource": "*",
            "Action": "iam:ListInstanceProfiles"
},

Was planning to deploy new version (1.7.1) instead of (1.6.3) however this would be a blocker without going back to manually update the policy (since we're using CF).

pt20223 avatar Sep 20 '25 01:09 pt20223

i'm experiencing the same issue in GovCloud. it appears that EKS cluster names cannot exceed 16 characters otherwise the policy exceeds the 6144 character boundary.

19D avatar Sep 22 '25 14:09 19D

Same problem in China (aws-cn partition results in longer ARNs)

artem-nefedov avatar Oct 02 '25 14:10 artem-nefedov

I have the same issue in production too. My production cluster has a longer name than the development cluster.

Eji4h avatar Oct 07 '25 18:10 Eji4h

We're also affected by this with >=31 characters cluster name length. Found this during preparation to update to v1.7.

nantiferov avatar Oct 14 '25 09:10 nantiferov

thanks to @lorengordon for pointing out the inline policy increased size (10,240 for inline vs 6,144 for standard policy https://github.com/terraform-aws-modules/terraform-aws-eks/issues/3512#issuecomment-3443969865)

and thanks @alexissellier for the EKS module implementation!

To get around this LimitExceeded error on the policy size, you can now set the following in your Terraform EKS Karpenter module configuration to gain an additional 4,096 characters on the policy size:

  enable_inline_policy = true

For non-Terraform users - simply switch to an inline policy to gain the additional headroom of 4,096 characters

bryantbiggs avatar Oct 27 '25 21:10 bryantbiggs

Small correction - variable name is slightly different, so it should be enable_inline_policy = true

nantiferov avatar Oct 28 '25 07:10 nantiferov