eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

`create cluster` fails when VPC CNI is configured to use both `iam.withOIDC` and `useDefaultPodIdentityAssociations`

Open cPu1 opened this issue 1 year ago • 4 comments

The following config results in a panic:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: test-cluster-3
  region: us-east-1
  version: '1.28'

addons:
- name: eks-pod-identity-agent
  version: v1.3.0
- name: vpc-cni
  version: v1.18.2
  useDefaultPodIdentityAssociations: true

iam:
  withOIDC: true

secretsEncryption:
  keyARN: arn:aws:kms:us-east-1:123456789:alias/test-kms
Stack trace -
2024-08-02 14:56:26 [ℹ]  creating addon
2024-08-02 14:56:27 [ℹ]  successfully created addon
2024-08-02 14:56:28 [ℹ]  "addonsConfig.autoApplyPodIdentityAssociations" is set to true; will lookup recommended pod identity configuration for "vpc-cni" addon
2024-08-02 14:56:30 [ℹ]  deploying stack "eksctl-test-cluster-3-addon-vpc-cni-podidentityrole-aws-node"
2024-08-02 14:56:30 [ℹ]  waiting for CloudFormation stack "eksctl-test-cluster-3-addon-vpc-cni-podidentityrole-aws-node"
2024-08-02 14:57:01 [ℹ]  waiting for CloudFormation stack "eksctl-test-cluster-3-addon-vpc-cni-podidentityrole-aws-node"
2024-08-02 14:57:02 [ℹ]  creating addon
2024-08-02 14:57:03 [ℹ]  successfully created addon
2024-08-02 14:57:04 [ℹ]  creating addon
2024-08-02 14:57:04 [ℹ]  successfully created addon
2024-08-02 14:57:05 [ℹ]  creating addon
2024-08-02 14:57:06 [ℹ]  successfully created addon
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x20 pc=0x1055303f8]

goroutine 187 [running]:
github.com/weaveworks/eksctl/pkg/actions/addon.(*Manager).Update(0x1400073f4a0, {0x107bc4e68, 0x10a3332e0}, 0x140005c2b40, {0x0, 0x0}, 0x15d3ef79800)
        github.com/weaveworks/eksctl/pkg/actions/addon/update.go:121 +0xeb8
github.com/weaveworks/eksctl/pkg/actions/addon.CreateAddonTasks.func3()
        github.com/weaveworks/eksctl/pkg/actions/addon/tasks.go:110 +0x90
github.com/weaveworks/eksctl/pkg/utils/tasks.(*GenericTask).Do(0x14000a2bd58, 0x0?)
        github.com/weaveworks/eksctl/pkg/utils/tasks/tasks.go:31 +0x34
github.com/weaveworks/eksctl/pkg/utils/tasks.doSingleTask(0x0?, {0x107b74ac0, 0x14000a2bd58})
        github.com/weaveworks/eksctl/pkg/utils/tasks/tasks.go:202 +0xc8
github.com/weaveworks/eksctl/pkg/utils/tasks.doSequentialTasks(0x1400061b4e0?, {0x1400061e980, 0x5, 0x1400022c160?})
        github.com/weaveworks/eksctl/pkg/utils/tasks/tasks.go:250 +0x6c
created by github.com/weaveworks/eksctl/pkg/utils/tasks.(*TaskTree).Do in goroutine 185
        github.com/weaveworks/eksctl/pkg/utils/tasks/tasks.go:158 +0x258

cPu1 avatar Aug 02 '24 09:08 cPu1

Not sure if related, but I found that ekctl 0.187.0 falsely complains in logs during create cluster when vpc-cni addon is specified without pod identity, but with attachPolicyARNs:

IRSA config is set for "vpc-cni" addon, but since OIDC is disabled on the cluster, eksctl cannot configure the requested permissions; the recommended way to provide IAM permissions for "vpc-cni" addon is via pod identity associations; after addon creation is completed, add all recommended policies to the config file, under addon.PodIdentityAssociations, and run eksctl update addon

The cluster config does have iam.withOIDC: true, and OIDC works without issues when cluster is created.

artem-nefedov avatar Aug 15 '24 22:08 artem-nefedov

can confirm @artem-nefedov 's experience. Had the same error message, despite ODIC being true. eksctl version is 0.190.0

MartinEmrich avatar Sep 27 '24 12:09 MartinEmrich

I have the same issue with 0.191.0-dev+c736924d6.2024-09-27T00:54:42Z. I've setup vpc-cni with the following settings:

addons:
- name: vpc-cni
  podIdentityAssociations:
  - namespace: kube-system
    permissionPolicyARNs: ["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"]
    serviceAccountName: aws-node

josegonzalez avatar Oct 08 '24 00:10 josegonzalez

Seems like the same issue as #7951.

josegonzalez avatar Oct 08 '24 01:10 josegonzalez

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Dec 04 '24 02:12 github-actions[bot]

I am the original person that brought this issue to AWS support. Just commenting to avoid ticket closure.

spencer-viray avatar Dec 04 '24 02:12 spencer-viray

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jan 05 '25 02:01 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jan 14 '25 01:01 github-actions[bot]

remove stale

Tested with 0.200.0, issue still there.

Update: seems like GHA won't reopen the issue, create a new issue #8141

guessi avatar Jan 14 '25 07:01 guessi