community icon indicating copy to clipboard operation
community copied to clipboard

EKS stops reconcile after ACK.Terminal status condition

Open tomitesh opened this issue 1 year ago • 8 comments

Describe the bug A concise description of what the bug is.

We have created a cluster using below details (please note , role arn provided in cluster definition).

if role does not exists when cluster is created (race condition), it shows ACK.Terminal condition in cluster status and never gets resolved even role is created successfully in next 1-2 seconds.

Both eks and iam controllers are configured to reconcile every 10 to 20 seconds (configuration attached in next section).

however if i restart eks controller by deleting pod, it reconclies successfully and removes ACK.Terminal condition. This solution is not practical as we can not keep restarting pod for every change in yaml.

Steps to reproduce

step 1 : create cluster first Step 2: create role

apiVersion: eks.services.k8s.aws/v1alpha1
kind: Cluster
metadata:
  annotations:
    services.k8s.aws/deletion-policy: delete
  finalizers:
  - finalizers.eks.services.k8s.aws/Cluster
  name: moon
  namespace: control
spec:
  kubernetesNetworkConfig:
    ipFamily: ipv4
    serviceIPv4CIDR: 172.20.0.0/16
  logging:
    clusterLogging:
    - enabled: true
      types:
      - api
      - audit
      - authenticator
      - controllerManager
      - scheduler
  name: moon
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: true
    publicAccessCIDRs:
    - 123.45.67.89/32
    securityGroupIDs:
    - sg-123
    subnetIDs:
    - subnet-123
    - subnet-456
    - subnet-789
  roleARN: arn:aws:iam::1234567890:role/moon-eks-cluster
  version: "1.25"
status:
  ackResourceMetadata:
    ownerAccountID: "1234567890"
    region: eu-central-1
  conditions:
  - message: |-
      InvalidParameterException: The provided role doesn't have the Amazon EKS Managed Policies associated with it. Please ensure the following policies [arn:aws:iam::aws:policy/AmazonEKSClusterPolicy] are attached
      {
        RespMetadata: {
          StatusCode: 400,
          RequestID: "aacb3dc6-6bdd-4031-a67e-ae6d461f7e4b"
        },
        ClusterName: "moon",
        Message_: "The provided role doesn't have the Amazon EKS Managed Policies associated with it. Please ensure the following policies [arn:aws:iam::aws:policy/AmazonEKSClusterPolicy] are attached"
      }
    status: "True"
    type: ACK.Terminal
  - lastTransitionTime: "2023-07-11T09:39:55Z"
    message: Resource not synced
    reason: resource is in terminal condition
    status: "False"
    type: ACK.ResourceSynced

role definition

apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
  annotations:
    services.k8s.aws/deletion-policy: delete
  finalizers:
  - finalizers.iam.services.k8s.aws/Role
  name: moon-eks-cluster
  namespace: control
spec:
  assumeRolePolicyDocument: |-
    {
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Sid": "EKSClusterAssumeRole",
                            "Effect": "Allow",
                            "Principal": {
                                "Service": "eks.amazonaws.com"
                            },
                            "Action": "sts:AssumeRole"
                        }
                    ]
                }
  description: IAM role that is used by an eks cluster.
  inlinePolicies:
    cluster-elb-sl: |-
      {
                      "Version": "2012-10-17",
                      "Statement": [
                          {
                              "Action": [
                                  "ec2:DescribeInternetGateways",
                                  "ec2:DescribeAddresses",
                                  "ec2:DescribeAccountAttributes"
                              ],
                              "Effect": "Allow",
                              "Resource": "*",
                              "Sid": ""
                          }
                      ]
                  }
  maxSessionDuration: 3600
  name: moon-eks-cluster
  path: /
  policies:
  - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
  - arn:aws:iam::aws:policy/AmazonEKSServicePolicy
  - arn:aws:iam::aws:policy/AmazonEKSVPCResourceController

status:
  ackResourceMetadata:
    arn: arn:aws:iam::1234567890:role/moon-eks-cluster
    ownerAccountID: "1234567890"
    region: eu-central-1
  conditions:
  - lastTransitionTime: "2023-07-11T10:11:59Z"
    message: Late initialization successful
    reason: Late initialization successful
    status: "True"
    type: ACK.LateInitialized
  - lastTransitionTime: "2023-07-11T10:11:59Z"
    message: Resource synced successfully
    reason: ""
    status: "True"
    type: ACK.ResourceSynced
  createDate: "2023-07-11T09:39:54Z"
  roleID: XXXXXXXXXXXXXXXXX
  roleLastUsed: {}

Both eks and iam controller are configured to reconcile every 10 to 20 seconds.

i.e eks helm chart values when installing controller

    reconcile:
      resourceResyncPeriods: {
        Nodegroup: 10,
        Cluster: 20,
        Addon: 15
      }

iam helm chart values when installing controller

    reconcile:
      resourceResyncPeriods: {
        Role: 10
      }

Expected outcome A concise description of what you expected to happen. As eks controller is configured to reconclile every 20 seconds, it should automatiicay sync in next reconcile loop after role is available.

Environment dev

  • Kubernetes version 1.25
  • Using EKS (yes/no), if so version? 1.25
  • AWS service targeted (S3, RDS, etc.) eks, iam

tomitesh avatar Jul 11 '23 10:07 tomitesh