kic-reference-architectures icon indicating copy to clipboard operation
kic-reference-architectures copied to clipboard

bug: EKS Failures on Cluster Provision

Open qdzlug opened this issue 2 years ago • 10 comments

Describe the bug Currently getting these errors with any new cluster:



#=============================================================================#
#                _     __        __  ____      _____   _  __  ____            #
#               / \    \ \      / / / ___|    | ____| | |/ / / ___|           #
#              / _ \    \ \ /\ / /  \___ \    |  _|   | ' /  \___ \           #
#             / ___ \    \ V  V /    ___) |   | |___  | . \   ___) |          #
#            /_/   \_\    \_/\_/    |____/    |_____| |_|\_\ |____/           #
#                                                                             #
#=============================================================================#


Previewing update (jayqa03a)

View Live: https://app.pulumi.com/qdzlug/eks-sample/jayqa03a/previews/56a86292-9351-43fb-bf07-6a4a6ebe4d53

     Type                             Name                                    Plan       Info
 +   pulumi:pulumi:Stack              eks-sample-jayqa03a                     create     3 errors; 13 messages
 +   ├─ aws:iam:Role                  ec2-nodegroup-iam-role                  create
 +   ├─ aws:iam:Role                  eks-iam-role                            create
 +   ├─ aws:iam:RolePolicyAttachment  eks-workernode-policy-attachment        create
 +   ├─ aws:iam:RolePolicyAttachment  eks-cni-policy-attachment               create
 +   ├─ aws:iam:RolePolicyAttachment  ec2-container-ro-policy-attachment      create
 +   ├─ aws:iam:RolePolicyAttachment  eks-service-policy-attachment           create
 +   ├─ aws:iam:RolePolicyAttachment  eks-cluster-policy-attachment           create
 +   ├─ aws:iam:InstanceProfile       node-group-profile-eks-sample-jayqa03a  create
 +   └─ eks:index:Cluster             eks-sample-jayqa03a                     create

Diagnostics:
  pulumi:pulumi:Stack (eks-sample-jayqa03a):
    aws f5 profile
    vpc id: vpc-02100163ddca82996
    public subnets: ['subnet-037e8190b129c21f9', 'subnet-067424ab7d59250ee', 'subnet-06bcbb32c6fc22b78', 'subnet-094f921f58536698e']
    public subnets: ['subnet-0069cac7c2e443c06', 'subnet-011a203e6f03dc209', 'subnet-05633d363b415e413', 'subnet-0250401dfa3fa7cd2']
    error: Program failed with an unhandled exception:
    error: Traceback (most recent call last):
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 519, in do_rpc_call
        return monitor.RegisterResource(req)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
        return _end_unary_response_blocking(state, call, False, None)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
        raise _InactiveRpcError(state)
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    	status = StatusCode.UNKNOWN
    	details = "Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach."
    	debug_error_string = "{"created":"@1639108673.187709011","description":"Error received from peer ipv4:127.0.0.1:45551","file":"src/core/lib/surface/call.cc","file_line":1063,"grpc_message":"Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.","grpc_status":2}"
    >

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/bin/pulumi-language-python-exec", line 107, in <module>
        loop.run_until_complete(coro)
      File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
        return future.result()
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 110, in run_in_stack
        await run_pulumi_func(lambda: Stack(func))
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 45, in run_pulumi_func
        await wait_for_rpcs()
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 94, in wait_for_rpcs
        raise exception
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 158, in run
        is_known = await self._is_known
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 158, in run
        is_known = await self._is_known
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/rpc_manager.py", line 65, in rpc_wrapper
        result = await rpc
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      [Previous line repeated 56 more times]
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 524, in do_register
        resp = await asyncio.get_event_loop().run_in_executor(None, do_rpc_call)
      File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 521, in do_rpc_call
        handle_grpc_error(exn)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/settings.py", line 254, in handle_grpc_error
        raise grpc_error_to_exception(exn)
    Exception: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
    error: an unhandled error occurred: Program exited with non-zero exit code: 1

    Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.: Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
        at createCore (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:374:15)
        at new Cluster (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:1405:22)
        at Object.construct (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/cluster.ts:21:29)
        at Provider.construct (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/index.ts:124:24)
        at Server.<anonymous> (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/provider/server.ts:322:48)
        at Generator.next (<anonymous>)
        at fulfilled (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/pulumi/provider/server.js:18:58)
        at processTicksAndRejections (node:internal/process/task_queues:96:5)

This does not seem to be occurring with existing stacks.

To Reproduce Steps to reproduce the behavior:

  1. Clone the repo, master branch.
  2. Setup and try and standup as normal.

Expected behavior Should standup the cluster.

Your environment

  • Ubuntu 21.10
  • Master branch (or any other branch)
  • Tried in US-WEST-1, US-WEST-2, and US-EAST-1.

Additional context None

qdzlug avatar Dec 10 '21 04:12 qdzlug

Key Error:

    Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.: Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.

The issue is in this block:

node_group_opts = eks.ClusterNodeGroupOptionsArgs(
    min_size=min_size,
    max_size=max_size,
    desired_capacity=desired_capacity,
    instance_type=instance_type,
)

instance_profile = aws.iam.InstanceProfile(
    resource_name=f'node-group-profile-{project_name}-{stack_name}',
    role=iam.ec2_role
)

cluster_args = eks.ClusterArgs(
    node_group_options=node_group_opts,
    vpc_id=vpc_definition.vpc_id,
    public_subnet_ids=vpc_definition.public_subnet_ids,
    private_subnet_ids=vpc_definition.private_subnet_ids,
    service_role=iam.eks_role,
    create_oidc_provider=False,
    version=k8s_version,
    provider_credential_opts=provider_credential_opts,
    tags={"Project": project_name, "Stack": stack_name}
)

Commenting out node_group_options=node_group_opts, allows the process to run as normal (albeit with the default values for what we set in that section.

However, going through and commenting out each line sequentially and testing every try is a failure (outside of the node group options).

So, I'm not sure what is conflicting and violating the mutual exclusivity.

qdzlug avatar Dec 10 '21 04:12 qdzlug

This problem went away and then reappeared again today; it seems to coincide with #71 when it occurs.

qdzlug avatar Dec 13 '21 23:12 qdzlug

This definitely results in #71

Starting with this config:

config:
  aws:region: us-west-2
  aws:profile: f5
  grafana:adminpass: password
  kic:image_origin: registry
  kic:image_name: nginx/nginx-ingress:2.0.3

The failure leaves us with

config:
  aws:region: us-west-2

Failure looks like this:

Diagnostics:
  pulumi:pulumi:Stack (eks-sample-jayqa05c):
    Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.: Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
        at createCore (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:374:15)
        at new Cluster (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:1405:22)
        at Object.construct (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/cluster.ts:21:29)
        at Provider.construct (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/index.ts:124:24)
        at Server.<anonymous> (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/provider/server.ts:322:48)
        at Generator.next (<anonymous>)
        at fulfilled (/home/jschmidt/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/pulumi/provider/server.js:18:58)
        at processTicksAndRejections (node:internal/process/task_queues:96:5)

    aws f5 profile
    vpc id: vpc-094f8f8f0041d7f9b
    public subnets: ['subnet-0449c8f1a0dece5ef', 'subnet-0c87f7a20cddb5bae', 'subnet-0c76355627a13a0e0', 'subnet-0ce4b2f849e87d52c']
    public subnets: ['subnet-03b4ec558680ea8ef', 'subnet-024f422adaabb1991', 'subnet-01d22d0795894d2da', 'subnet-0a52892f6732f535a']
    error: Program failed with an unhandled exception:
    error: Traceback (most recent call last):
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 519, in do_rpc_call
        return monitor.RegisterResource(req)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
        return _end_unary_response_blocking(state, call, False, None)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
        raise _InactiveRpcError(state)
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    	status = StatusCode.UNKNOWN
    	details = "Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach."
    	debug_error_string = "{"created":"@1639439894.568731570","description":"Error received from peer ipv4:127.0.0.1:38777","file":"src/core/lib/surface/call.cc","file_line":1063,"grpc_message":"Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.","grpc_status":2}"
    >

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/bin/pulumi-language-python-exec", line 107, in <module>
        loop.run_until_complete(coro)
      File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
        return future.result()
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 110, in run_in_stack
        await run_pulumi_func(lambda: Stack(func))
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 45, in run_pulumi_func
        await wait_for_rpcs()
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/stack.py", line 94, in wait_for_rpcs
        raise exception
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 158, in run
        is_known = await self._is_known
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 158, in run
        is_known = await self._is_known
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/rpc_manager.py", line 65, in rpc_wrapper
        result = await rpc
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/output.py", line 94, in is_value_known
        return await is_known and not contains_unknowns(await future)
      [Previous line repeated 56 more times]
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 524, in do_register
        resp = await asyncio.get_event_loop().run_in_executor(None, do_rpc_call)
      File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/resource.py", line 521, in do_rpc_call
        handle_grpc_error(exn)
      File "/home/jschmidt/kic-reference-architectures/pulumi/aws/venv/lib/python3.9/site-packages/pulumi/runtime/settings.py", line 254, in handle_grpc_error
        raise grpc_error_to_exception(exn)
    Exception: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
    error: an unhandled error occurred: Program exited with non-zero exit code: 1

qdzlug avatar Dec 13 '21 23:12 qdzlug

https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts#L361-L375

qdzlug avatar Dec 14 '21 00:12 qdzlug

Raised https://github.com/pulumi/pulumi-eks/issues/643 with Pulumi

qdzlug avatar Dec 14 '21 15:12 qdzlug

Hi, I am getting the same error consistently. I couldn't get past the EKS stage because of the Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach. error. I just moved the node group options arguments from node_group_opts to eks.ClusterArgs and it is working now:

# node_group_opts = eks.ClusterNodeGroupOptionsArgs(
#     min_size=min_size,
#     max_size=max_size,
#     desired_capacity=desired_capacity,
#     instance_type=instance_type,
# )

instance_profile = aws.iam.InstanceProfile(
    resource_name=f'node-group-profile-{project_name}-{stack_name}',
    role=iam.ec2_role
)

cluster_args = eks.ClusterArgs(
    min_size=min_size,  # from node_group_opts
    max_size=max_size,  # from node_group_opts
    desired_capacity=desired_capacity,  # from node_group_opts
    instance_type=instance_type,  # from node_group_opts
    vpc_id=vpc_definition.vpc_id,
    public_subnet_ids=vpc_definition.public_subnet_ids,
    private_subnet_ids=vpc_definition.private_subnet_ids,
    service_role=iam.eks_role,
    create_oidc_provider=False,
    version=k8s_version,
    provider_credential_opts=provider_credential_opts,
    tags={"Project": project_name, "Stack": stack_name}
)

monrax avatar Jan 07 '22 15:01 monrax

@monrax - thanks for the workaround!

I'm going to look at working this into the code in the next week or so.

Cheers,

Jay

qdzlug avatar Jan 07 '22 19:01 qdzlug

I'm running into this issue as well. I'm using Go and currently the codeblock for the EKS cluster looks like:

                // Create an EKS cluster
                cluster, err := eks.NewCluster(ctx, "Test", &eks.ClusterArgs{
                        VpcId: pulumi.String(vpcid),
                        PrivateSubnetIds: pulumi.StringArray{
                                pulumi.String(private[0]),
                                pulumi.String(private[1]),
                                pulumi.String(private[2]),
                        },
                        PublicSubnetIds: pulumi.StringArray{
                                pulumi.String(public[0]),
                                pulumi.String(public[1]),
                                pulumi.String(public[2]),
                        },
                        ClusterSecurityGroup: sg,
                        EndpointPrivateAccess: pulumi.Bool(true),
                        EndpointPublicAccess: pulumi.Bool(false),
                        NodeGroupOptions: &eks.ClusterNodeGroupOptionsArgs{
                                InstanceType: pulumi.String("t3a.medium"),
                                NodeAssociatePublicIpAddress: pulumi.Bool(false),
                                ExtraNodeSecurityGroups: ec2.SecurityGroupArray{
                                        xtrasg,
                                },
                        },
                })
                if err != nil {
                        return err
                }

This results in:

Diagnostics:
  pulumi:pulumi:Stack (EKS-EKS-test):
    Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.: Error: Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
        at createCore (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:374:15)
        at new Cluster (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cluster.ts:1405:22)
        at Object.construct (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/cluster.ts:21:29)
        at Provider.construct (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/cmd/provider/index.ts:124:24)
        at Server.<anonymous> (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/provider/server.ts:322:48)
        at Generator.next (<anonymous>)
        at fulfilled (/home/ubuntu/.pulumi/plugins/resource-eks-v0.36.0/node_modules/@pulumi/pulumi/provider/server.js:18:58)
        at processTicksAndRejections (node:internal/process/task_queues:96:5)
 
    error: program failed: waiting for RPCs: rpc error: code = Unknown desc = Setting nodeGroupOptions, and any set of singular node group option(s) on the cluster, is mutually exclusive. Choose a single approach.
    exit status 1
 
    error: an unhandled error occurred: program exited with non-zero exit code: 1

My guess is that defaults set some node group options in the ClusterArgs and I need to find them all and move them to the ClusterNodeGroupOptionsArgs

As you can see from the code setting the options all in ClusterArgs won't work for me as I want to set ExtraNodeSecurityGroups which doesn't appear in ClusterArgs

I'll see what I can achieve and report back here if I managed success.

M-JobPixel avatar Jan 20 '22 18:01 M-JobPixel

Hmmm I might have put this in the wrong thread.

I added it to https://github.com/pulumi/pulumi-eks/issues/643 too

M-JobPixel avatar Jan 20 '22 18:01 M-JobPixel

@M-JobPixel - no worries! I'm glad you dropped it here as well so I can keep an eye on it since it seems to be a pulumi backend issue (since this occurs in different languages)

qdzlug avatar Jan 21 '22 17:01 qdzlug