community icon indicating copy to clipboard operation
community copied to clipboard

nodeRoleRef & subnetRefs in eks controller - the referenced resource is missing the target field

Open tomitesh opened this issue 1 year ago • 12 comments

Describe the bug A concise description of what the bug is.

We utilize GitOps to deploy all assets by leveraging the aws-controllers-k8s framework. Specifically, we are utilizing the eks controller to facilitate the creation of EKS clusters, node groups, and add-ons.

During the process of creating a node group (with the kind: Nodegroup), we encounter an error stating "the referenced resource is missing the target field" when utilizing reference fields such as nodeRoleRef and subnetRefs.

Steps to reproduce

  1. use nodeRoleRef instead of nodeRole or
  2. use subnetRefs instead of subnets

Although the code successfully deploys, the status field displays an error message within the conditions section.

message: the referenced resource is missing the target field. resource:Role, namespace:control,
  name:flash-rancher-eks-worker, targetField:Status.ACKResourceMetadata.ARN
status: Unknown
type: ACK.ReferencesResolved
  • When I utilize the "nodeRole" field with the appropriate role ARN, it functions as expected.
  • Similarly, I encounter the same issue with the "subnetRefs" field (commented out in the code).
kind: Nodegroup
metadata:
  name: flash-rancher-nodegroup
  namespace: control
spec:
  amiType: AL2_x86_64
  capacityType: SPOT
  clusterName: flash-rancher
  diskSize: 20
  instanceTypes:
    - m4.xlarge
    - m5.xlarge
  name: flash-rancher-nodegroup
  nodeRoleRef:
    from:
      name: flash-rancher-eks-worker
  releaseVersion: 1.25.9-20230513
  scalingConfig:
    desiredSize: 1
    maxSize: 1
    minSize: 1
  subnets:
    - subnet-111111111111111111111
    - subnet-2222222222222222
    - subnet-333333333333333
#  subnetRefs:
#    - from:
#        name: app1-sub
#    - from:
#        name: app2-sub
#    - from:
#        name: app3-sub
  updateConfig:
    maxUnavailable: 1
  version: "1.25"

Expected outcome A concise description of what you expected to happen.

  • If I utilize "nodeRoleRef" with the role name, it should locate the role and establish the appropriate mapping with the resource mentioned above.
  • Similarly, when I employ "subnetRefs" with the subnet name, it should identify the specified subnet and establish the necessary mapping with the resource mentioned earlier.

Environment development

  • Kubernetes version : 1.25
  • Using EKS (yes/no), if so version? yes, 1.25
  • AWS service targeted (S3, RDS, etc.) eks, nodegroup

tomitesh avatar Jun 01 '23 07:06 tomitesh

Could you provide the description of the flash-rancher-eks-worker Role?

I have a feeling that this resource isn't properly being created - and that's why it doesn't have an ARN. Also, maybe a silly question, but do you have the iam-controller installed into the cluster?

RedbackThomson avatar Jun 01 '23 18:06 RedbackThomson

Thanks @RedbackThomson for quick response.

  • Yes, the IAM controller is installed and it is installed in the same namespace (aws-operators) as the eks-controller. The nodegroup (flash-rancher-nodegroup), eks cluster (flash-rancher) and role (flash-rancher-eks-worker) are created in "control" namespace.

image

  • Using the "flash-rancher-eks-worker" role with ARN and the nodeRole attribute is functioning correctly. There doesn't appear to be any issues with role creation in this case.

Note: I have anonymized the data by replacing it with "xxxxxxxxxxxx".

if i use arn with nodeRole, it works
        nodeRole: arn:aws:iam::xxxxxxxxxxxx:role/flash-rancher-eks-worker
if i use nodeRoleRef, it's failing with error "the referenced resource is missing the target field"
       nodeRoleRef:
          from:
            name: flash-rancher-eks-worker
  • Please find role yaml file (flash-rancher-eks-worker)

apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
  annotations:
    meta.helm.sh/release-name: control-repo-cd-control-cluster-iam
    meta.helm.sh/release-namespace: control
  finalizers:
    - finalizers.iam.services.k8s.aws/Role
  labels:
    app.kubernetes.io/managed-by: Helm
  name: flash-rancher-eks-worker
  namespace: control
spec:
  assumeRolePolicyDocument: '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
  inlinePolicies:
    session-manager-logs: |-
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs/*"
          },
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:GetEncryptionConfiguration",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-dev-session-manager-logs"
          }
        ]
      }
    fluentDCloudWatchLogging: |-
      {
        "Version": "2012-10-17",
        "Statement": [
            {
              "Action": "logs:DescribeLogGroups",
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:*:*",
              "Sid": "FluentDCloudWatchLoggingViewLogGroups"
            },
            {
              "Action": [
                    "logs:PutRetentionPolicy",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams",
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup"
                ],
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:/k8s/*:*",
              "Sid": "FluentDCloudWatchLoggingWrite"
            }
          ]
      }
  maxSessionDuration: 3600
  name: flash-rancher-eks-worker
  path: /
  policies:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
    - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

kindly let me know if you need more info.

tomitesh avatar Jun 02 '23 09:06 tomitesh

Could you provide the output of kubectl get roles -n control flash-rancher-eks-worker? Specifically, the status of that object should have some sort of error - or we will see if it contains the ARN and there is some fault elsewhere.

However, just from a cursory glance at the Role, I'd double check that the Resource field within the fluentDCloudWatchLogging statement is correct - they don't look like valid ARNs.

RedbackThomson avatar Jun 06 '23 19:06 RedbackThomson

output of command "kubectl get roles -n control flash-rancher-eks-worker" image

output in yaml format is as below

apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
  annotations:
    meta.helm.sh/release-name: control-repo-cd-control-cluster-iam
    meta.helm.sh/release-namespace: control
    objectset.rio.cattle.io/id: default-control-repo-cd-control-cluster-iam-cattle-fleet-d3f1da
    services.k8s.aws/deletion-policy: retain
  creationTimestamp: "2023-05-29T20:20:02Z"
  finalizers:
  - finalizers.iam.services.k8s.aws/Role
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    objectset.rio.cattle.io/hash: 50746d8429094aa76c6283a7a838da5a62dbb312
  name: flash-rancher-eks-worker
  namespace: control
  resourceVersion: "12150677"
  uid: b22ed0b2-82a4-4d20-956e-9de9e167d19c
spec:
  assumeRolePolicyDocument: '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
  inlinePolicies:
    fluentDCloudWatchLogging: |-
      {
        "Version": "2012-10-17",
        "Statement": [
            {
              "Action": "logs:DescribeLogGroups",
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:*:*",
              "Sid": "FluentDCloudWatchLoggingViewLogGroups"
            },
            {
              "Action": [
                    "logs:PutRetentionPolicy",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams",
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup"
                ],
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:/k8s/*:*",
              "Sid": "FluentDCloudWatchLoggingWrite"
            }
          ]
      }
    session-manager-logs: |-
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs/*"
          },
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:GetEncryptionConfiguration",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs"
          }
        ]
      }
  maxSessionDuration: 3600
  name: flash-rancher-eks-worker
  path: /
  policies:
  - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
  - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
status:
  ackResourceMetadata:
    arn: arn:aws:iam::xxxxxxxxxxxx:role/flash-rancher-eks-worker
    ownerAccountID: "xxxxxxxxxxxx"
    region: eu-central-1
  conditions:
  - lastTransitionTime: "2023-06-06T12:05:06Z"
    message: Late initialization successful
    reason: Late initialization successful
    status: "True"
    type: ACK.LateInitialized
  - lastTransitionTime: "2023-06-06T12:05:06Z"
    message: Resource synced successfully
    reason: ""
    status: "True"
    type: ACK.ResourceSynced
  createDate: "2023-05-17T10:50:29Z"
  roleID: AROA2PXUAJR2KGGG7BHYV
  roleLastUsed:
    lastUsedDate: "2023-06-06T11:47:36Z"
    region: eu-central-1

tomitesh avatar Jun 07 '23 09:06 tomitesh

questions : if i use eks controller to deploy nodegroup and

  1. specify nodeRoleRef to search role based on name, does it requires iam controller along with Role resource deployed on cluster?
  2. specify subnetRef to search subnet based on name, does it requires ec2 controller along with subnet resource deployed on cluster?

tomitesh avatar Jun 08 '23 09:06 tomitesh

specify nodeRoleRef to search role based on name, does it requires iam controller along with Role resource deployed on cluster?

EKS controller doesn't explicitly require the IAM controller in its code, but it does require an ACK IAM Role was created using that controller and has the ACK.ResourceSynced = true condition in its status.

specify subnetRef to search subnet based on name, does it requires ec2 controller along with subnet resource deployed on cluster?

Same as IAM, nothing explicitly in the code, but the resource is required to be created by that controller.

RedbackThomson avatar Jun 12 '23 19:06 RedbackThomson

I don't see anything wrong with your resources or your logic. The role looks well formed, and it has the conditions required for it to be referenced by the controller.

I think the only other possibility there would be for that error is that you may have tried to create the Nodegroup before the IAM controller created the Role. However, the EKS controller should retry the creation of the Nodegroup (with exponential backoff) until the Role can be referenced and then it should proceed.

RedbackThomson avatar Jun 12 '23 19:06 RedbackThomson

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Dec 10 '23 00:12 ack-bot

Stale issues rot after 60d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 60d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot avatar Feb 08 '24 01:02 ack-bot

/remove-lifecycle rotten

gecube avatar Mar 14 '24 07:03 gecube