cluster-api-provider-aws Nodegroup role name exceeds maximum length

trafficstars

/kind bug /area provider/eks /priority important-soon /milestone v0.7.x

What steps did you take and what happened:

Created stack to allow iam roles per cluster:

apiVersion: bootstrap.aws.infrastructure.cluster.x-k8s.io/v1alpha1
kind: AWSIAMConfiguration
spec:
  bootstrapUser:
    enable: true
  eks:
    enable: true
    iamRoleCreation: true # Set to true if you plan to use the EKSEnableIAM feature flag to enable automatic creation of IAM roles

Enabled IAM roles per cluster using the following before clusterctl init:

export EXP_EKS_IAM=true"

Created a machine pool with the following specs:

apiVersion: exp.cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  name: "capi-managed-test-pool-0"
spec:
  clusterName: "capi-managed-test"
  template:
    spec:
      clusterName: "capi-managed-test"
      bootstrap:
        dataSecretName: ""
      infrastructureRef:
        name: "capi-managed-test-pool-0"
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSManagedMachinePool
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSManagedMachinePool
metadata:
  name: "capi-managed-test-pool-0"
spec:
  instanceType: t3.large
  amiType: AL2_x86_64

And then we get the following error when reconciliation occurs:

[manager] E0824 12:46:07.186195       8 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to reconcile machine pool for AWSManagedMachinePool default/capi-managed-test-pool-0: ValidationError: 1 validation error detected: Value 'default_capi-managed-test-control-plane-default_capi-managed-test-pool-0-nodegroup-iam-service-role' at 'roleName' failed to satisfy constraint: Member must have length less than or equal to 64\n\tstatus code: 400, request id: d1f43a56-6b8f-4ebe-984c-6c24647e43e9" "controller"="awsmanagedmachinepool" "name"="capi-managed-test-pool-0" "namespace"="default"

What did you expect to happen: I would expect the controller to detect the length of the auto-generated role name and then truncate/hash if its too long.

Anything else you would like to add: This fix will need to be applied to 0.7x and then back ported to 0.6.x

Environment:

Cluster-api-provider-aws version: 0.6.8

Aug 24 '21 12:08 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 06 '22 19:02 k8s-triage-robot

/lifecycle frozen /good-first-issue

Feb 07 '22 09:02 richardcase

@richardcase: This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

/lifecycle frozen /good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 07 '22 09:02 k8s-ci-robot

/lifecycle frozen

Feb 07 '22 10:02 richardcase

@richardcase can I take this one?

Jun 16 '22 11:06 Callisto13

/assign Callisto13

Jun 16 '22 13:06 richardcase

I think this is already in place, just perhaps not ported back to 0.6? https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/851a8141a6c94c9c59b5219cbdee758334217750/pkg/eks/eks.go#L33

see https://github.com/kubernetes-sigs/cluster-api-provider-aws/commit/50ed343

Jun 16 '22 15:06 Callisto13

/remove-lifecycle frozen

Jul 12 '22 16:07 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 10 '22 17:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 09 '22 18:11 k8s-triage-robot

Is this issue still needs to solve

Jan 21 '23 02:01 Sajiyah-Salat

@richardcase the description originally said that a fix should be backported to 0.6.0, but that is a very old version now. Do we want to bother doing that or close this as fixed from 0.7.0?

Jan 23 '23 12:01 Callisto13

As this is fixed in the currently supported versions: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/pkg/cloud/services/eks/roles.go#L165:L168 (thanks @Callisto13 for checking this) we can now close this issue as there is no need to backport to 0.6.0 as its unsupported now.

Jan 26 '23 11:01 richardcase

/close

Jan 26 '23 11:01 richardcase

@richardcase: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 26 '23 11:01 k8s-ci-robot

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Nodegroup role name exceeds maximum length

Guidelines

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard