aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

aws_eks: kubectl function shouldn't require the vpc when endpointAccess is PUBLIC_AND_PRIVATE

Open diranged opened this issue 2 years ago • 5 comments

Describe the bug

See AWS Case ID #13749604991 for more details.

We've been working on building out new EKS clusters based on CDK rather than native CloudFormation. As part of that, we're iterating with the integration test runner. We were struggling to understand why the DELETE process for the tests was taking 30+m, when we discovered that the KubectlHandler function is being created inside the VPC. This was surprising to us, as we specifically are not requiring that (we are not setting the placeClusterHandlerInVpc flag).

It turns out that the Cluster class makes the decision to place the KubectlHandler into the VPC at https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1594:

    if (this.endpointAccess._config.privateAccess && privateSubnets.length !== 0) {

      // when private access is enabled and the vpc has private subnets, lets connect
      // the provider to the vpc so that it will work even when restricting public access.

      // validate VPC properties according to: https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html
      if (this.vpc instanceof ec2.Vpc && !(this.vpc.dnsHostnamesEnabled && this.vpc.dnsSupportEnabled)) {
        throw new Error('Private endpoint access requires the VPC to have DNS support and DNS hostnames enabled. Use `enableDnsHostnames: true` and `enableDnsSupport: true` when creating the VPC.');
      }

      this.kubectlPrivateSubnets = privateSubnets;

      // the vpc must exist in order to properly delete the cluster (since we run `kubectl delete`).
      // this ensures that.
      this._clusterResource.node.addDependency(this.vpc);
    }

While yes .. we want our clusters to be PUBLCI and PRIVATE so that within the VPC, API calls can go directly to the API without leaving the network. On the other hand, we couldn't care less about the Lambda function being inside the VPC. According to our case rep, any time a DELETE call is made to a Lambda function with attached ENIs, there is a 20m delay between the DELETE starting and finishing. This delay adds a huge amount of wasted time to our integration tests and basically slows everything down for little value.

Expected Behavior

I would expect that if the cluster is PUBLIC_AND_PRIVATE, and I set placeClusterHandlerInVpc: false, that the Lambda functions would indeed NOT be placed in the VPC.

Current Behavior

They are placed in the VPC, regardless of our placeClusterHandlerInVpc setting.

Reproduction Steps

Create a cluster with PUBLIC_AND_PRIVATE set ... then try deleting it. Watch it take 20 minutes to delete.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.94.0

Framework Version

No response

Node.js Version

18

OS

linux

Language

Typescript

Language Version

No response

Other information

No response

diranged avatar Sep 07 '23 21:09 diranged

We developed a hacky fix here using the escape hatches ...

let kubectlProvider = Stack.of(this).node.findChild('@aws-cdk/aws-eks.KubectlProvider');
let handler = kubectlProvider.node.findChild('Handler');
let resource = handler.node.findChild('Resource') as CfnFunction;
resource.vpcConfig = undefined;
$ npx integ-runner --no-clean --force integ/constructs/aws_eks/integ.cluster.ts

Verifying integration test snapshots...

  CHANGED    integ/constructs/aws_eks/integ.cluster 4.488s
      Resources
[~] AWS::IAM::Role TestKubectlHandlerRole7073B199 
 └─ [-] DependsOn
     └─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Policy TestKubectlHandlerRoleDefaultPolicy40956F86 
 └─ [-] DependsOn
     └─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Role TestRole17AB2208 
 └─ [-] DependsOn
     └─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::EC2::SecurityGroup TestControlPlaneSecurityGroup05BB1761 
 └─ [-] DependsOn
     └─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Role TestCreationRoleE90BFD6D 
 └─ [~] DependsOn
     └─ @@ -1,9 +1,4 @@
        [ ] [
        [-]   "TestClusterCIDRBlockA2FCD90D",
        [-]   "TestClusterCIDRBlockSubnet0CD56F47A",
        [-]   "TestClusterCIDRBlockSubnet1216B045C",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
        [ ]   "UserSuppliedVpcIGWCA64A826",
        [ ]   "UserSuppliedVpcPrivateSubnet1DefaultRoute966544D4",
        [ ]   "UserSuppliedVpcPrivateSubnet1RouteTableC32F4B57",
[~] AWS::IAM::Policy TestCreationRoleDefaultPolicy592DF1D6 
 └─ [~] DependsOn
     └─ @@ -1,9 +1,4 @@
        [ ] [
        [-]   "TestClusterCIDRBlockA2FCD90D",
        [-]   "TestClusterCIDRBlockSubnet0CD56F47A",
        [-]   "TestClusterCIDRBlockSubnet1216B045C",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
        [ ]   "UserSuppliedVpcIGWCA64A826",
        [ ]   "UserSuppliedVpcPrivateSubnet1DefaultRoute966544D4",
        [ ]   "UserSuppliedVpcPrivateSubnet1RouteTableC32F4B57",
[~] Custom::AWSCDK-EKS-Cluster Test3ACBFD1D 
 └─ [~] DependsOn
     └─ @@ -1,9 +1,4 @@
        [ ] [
        [-]   "TestClusterCIDRBlockA2FCD90D",
        [-]   "TestClusterCIDRBlockSubnet0CD56F47A",
        [-]   "TestClusterCIDRBlockSubnet1216B045C",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
        [ ]   "TestCreationRoleDefaultPolicy592DF1D6",
        [ ]   "TestCreationRoleE90BFD6D",
        [ ]   "UserSuppliedVpcIGWCA64A826",
[~] AWS::SSM::Parameter TestKubectlReadyBarrier082A53D9 
 └─ [~] DependsOn
     └─ @@ -1,9 +1,4 @@
        [ ] [
        [-]   "TestClusterCIDRBlockA2FCD90D",
        [-]   "TestClusterCIDRBlockSubnet0CD56F47A",
        [-]   "TestClusterCIDRBlockSubnet1216B045C",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
        [ ]   "TestCreationRoleDefaultPolicy592DF1D6",
        [ ]   "TestCreationRoleE90BFD6D",
        [ ]   "Test3ACBFD1D"
[~] Custom::AWSCDK-EKS-KubernetesResource TestAwsAuthmanifestE302BF21 
 └─ [~] DependsOn
     └─ @@ -1,8 +1,3 @@
        [ ] [
        [-]   "TestClusterCIDRBlockA2FCD90D",
        [-]   "TestClusterCIDRBlockSubnet0CD56F47A",
        [-]   "TestClusterCIDRBlockSubnet1216B045C",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
        [-]   "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
        [ ]   "TestKubectlReadyBarrier082A53D9"
        [ ] ]


  CHANGED    integ/constructs/aws_eks/integ.cluster 4.488s
      Resources
[~] AWS::Lambda::Function Handler886CB40B 
 └─ [-] VpcConfig
     └─ {"SecurityGroupIds":[{"Ref":"referencetoINFRACDKK8SClusterTest9112B301ClusterSecurityGroupId"}],"SubnetIds":[{"Ref":"referencetoINFRACDKK8SClusterTestUserSuppliedVpcPrivateSubnet1SubnetF8460491Ref"},{"Ref":"referencetoINFRACDKK8SClusterTestUserSuppliedVpcPrivateSubnet2SubnetDDB7C6A0Ref"}]}

diranged avatar Sep 07 '23 22:09 diranged

Looks like there's another resource in the provider framework that needs to also have the vpcConfig removed to fix the issue:

 // onevent handler
    let provider = kubectlProvider.node.findChild('Provider');
    let framework = provider.node.findChild('framework-onEvent');
    let onEventResource = framework.node.findChild('Resource') as CfnFunction;
    onEventResource.vpcConfig = undefined;

diranged avatar Sep 08 '23 03:09 diranged

I can't comment on why the handler is placed inside the private side of the VPC except that I would assume that it is in fact for the purposes of security.

Please also understand that this team cannot see AWS cases.

While this workaround isn't great, I'm glad it works for you.

indrora avatar Sep 11 '23 17:09 indrora

@indrora, So I am glad that I was able to come up with a workaround - but I still believe the fundamental behavior is wrong. I believe this should follow the behavior of the placeClusterHandlerInVpc setting.

diranged avatar Sep 13 '23 01:09 diranged

Hi @diranged,

Thanks for raising the issue. There are 2 lambda functions involved here:

  • Cluster Handler - which uses EKS API to create the EKS Cluster
  • Kubectl Handler - which is to issue kubectl commands against the cluster

placeClusterHandlerInVpc setting is for the Cluster Handler so it doesn't impact Kubectl Handler. Regarding the kubectl Handler vpc setup, here is the description: https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk-lib/aws-eks#kubectl-handler

Breaking this down, it means that if the endpoint exposes private access 
(via EndpointAccess.PRIVATE or EndpointAccess.PUBLIC_AND_PRIVATE), 
and the VPC contains private subnets, the Lambda function will be provisioned inside the VPC 
and use the private subnets to interact with the cluster. This is the common use-case.

We're about to rewrite the current EKS module and will definitely consider your feedback when making decisions.

Thanks!

xazhao avatar Oct 24 '24 21:10 xazhao