aws_eks: kubectl function shouldn't require the vpc when endpointAccess is PUBLIC_AND_PRIVATE
Describe the bug
See AWS Case ID #13749604991 for more details.
We've been working on building out new EKS clusters based on CDK rather than native CloudFormation. As part of that, we're iterating with the integration test runner. We were struggling to understand why the DELETE process for the tests was taking 30+m, when we discovered that the KubectlHandler function is being created inside the VPC. This was surprising to us, as we specifically are not requiring that (we are not setting the placeClusterHandlerInVpc flag).
It turns out that the Cluster class makes the decision to place the KubectlHandler into the VPC at https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-eks/lib/cluster.ts#L1594:
if (this.endpointAccess._config.privateAccess && privateSubnets.length !== 0) {
// when private access is enabled and the vpc has private subnets, lets connect
// the provider to the vpc so that it will work even when restricting public access.
// validate VPC properties according to: https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html
if (this.vpc instanceof ec2.Vpc && !(this.vpc.dnsHostnamesEnabled && this.vpc.dnsSupportEnabled)) {
throw new Error('Private endpoint access requires the VPC to have DNS support and DNS hostnames enabled. Use `enableDnsHostnames: true` and `enableDnsSupport: true` when creating the VPC.');
}
this.kubectlPrivateSubnets = privateSubnets;
// the vpc must exist in order to properly delete the cluster (since we run `kubectl delete`).
// this ensures that.
this._clusterResource.node.addDependency(this.vpc);
}
While yes .. we want our clusters to be PUBLCI and PRIVATE so that within the VPC, API calls can go directly to the API without leaving the network. On the other hand, we couldn't care less about the Lambda function being inside the VPC. According to our case rep, any time a DELETE call is made to a Lambda function with attached ENIs, there is a 20m delay between the DELETE starting and finishing. This delay adds a huge amount of wasted time to our integration tests and basically slows everything down for little value.
Expected Behavior
I would expect that if the cluster is PUBLIC_AND_PRIVATE, and I set placeClusterHandlerInVpc: false, that the Lambda functions would indeed NOT be placed in the VPC.
Current Behavior
They are placed in the VPC, regardless of our placeClusterHandlerInVpc setting.
Reproduction Steps
Create a cluster with PUBLIC_AND_PRIVATE set ... then try deleting it. Watch it take 20 minutes to delete.
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.94.0
Framework Version
No response
Node.js Version
18
OS
linux
Language
Typescript
Language Version
No response
Other information
No response
We developed a hacky fix here using the escape hatches ...
let kubectlProvider = Stack.of(this).node.findChild('@aws-cdk/aws-eks.KubectlProvider');
let handler = kubectlProvider.node.findChild('Handler');
let resource = handler.node.findChild('Resource') as CfnFunction;
resource.vpcConfig = undefined;
$ npx integ-runner --no-clean --force integ/constructs/aws_eks/integ.cluster.ts
Verifying integration test snapshots...
CHANGED integ/constructs/aws_eks/integ.cluster 4.488s
Resources
[~] AWS::IAM::Role TestKubectlHandlerRole7073B199
└─ [-] DependsOn
└─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Policy TestKubectlHandlerRoleDefaultPolicy40956F86
└─ [-] DependsOn
└─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Role TestRole17AB2208
└─ [-] DependsOn
└─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::EC2::SecurityGroup TestControlPlaneSecurityGroup05BB1761
└─ [-] DependsOn
└─ ["TestClusterCIDRBlockA2FCD90D","TestClusterCIDRBlockSubnet0CD56F47A","TestClusterCIDRBlockSubnet1216B045C","TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8","TestClusterCIDRBlockSubnetRouteTableAssociation15F930483"]
[~] AWS::IAM::Role TestCreationRoleE90BFD6D
└─ [~] DependsOn
└─ @@ -1,9 +1,4 @@
[ ] [
[-] "TestClusterCIDRBlockA2FCD90D",
[-] "TestClusterCIDRBlockSubnet0CD56F47A",
[-] "TestClusterCIDRBlockSubnet1216B045C",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
[ ] "UserSuppliedVpcIGWCA64A826",
[ ] "UserSuppliedVpcPrivateSubnet1DefaultRoute966544D4",
[ ] "UserSuppliedVpcPrivateSubnet1RouteTableC32F4B57",
[~] AWS::IAM::Policy TestCreationRoleDefaultPolicy592DF1D6
└─ [~] DependsOn
└─ @@ -1,9 +1,4 @@
[ ] [
[-] "TestClusterCIDRBlockA2FCD90D",
[-] "TestClusterCIDRBlockSubnet0CD56F47A",
[-] "TestClusterCIDRBlockSubnet1216B045C",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
[ ] "UserSuppliedVpcIGWCA64A826",
[ ] "UserSuppliedVpcPrivateSubnet1DefaultRoute966544D4",
[ ] "UserSuppliedVpcPrivateSubnet1RouteTableC32F4B57",
[~] Custom::AWSCDK-EKS-Cluster Test3ACBFD1D
└─ [~] DependsOn
└─ @@ -1,9 +1,4 @@
[ ] [
[-] "TestClusterCIDRBlockA2FCD90D",
[-] "TestClusterCIDRBlockSubnet0CD56F47A",
[-] "TestClusterCIDRBlockSubnet1216B045C",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
[ ] "TestCreationRoleDefaultPolicy592DF1D6",
[ ] "TestCreationRoleE90BFD6D",
[ ] "UserSuppliedVpcIGWCA64A826",
[~] AWS::SSM::Parameter TestKubectlReadyBarrier082A53D9
└─ [~] DependsOn
└─ @@ -1,9 +1,4 @@
[ ] [
[-] "TestClusterCIDRBlockA2FCD90D",
[-] "TestClusterCIDRBlockSubnet0CD56F47A",
[-] "TestClusterCIDRBlockSubnet1216B045C",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
[ ] "TestCreationRoleDefaultPolicy592DF1D6",
[ ] "TestCreationRoleE90BFD6D",
[ ] "Test3ACBFD1D"
[~] Custom::AWSCDK-EKS-KubernetesResource TestAwsAuthmanifestE302BF21
└─ [~] DependsOn
└─ @@ -1,8 +1,3 @@
[ ] [
[-] "TestClusterCIDRBlockA2FCD90D",
[-] "TestClusterCIDRBlockSubnet0CD56F47A",
[-] "TestClusterCIDRBlockSubnet1216B045C",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation0A86BFFB8",
[-] "TestClusterCIDRBlockSubnetRouteTableAssociation15F930483",
[ ] "TestKubectlReadyBarrier082A53D9"
[ ] ]
CHANGED integ/constructs/aws_eks/integ.cluster 4.488s
Resources
[~] AWS::Lambda::Function Handler886CB40B
└─ [-] VpcConfig
└─ {"SecurityGroupIds":[{"Ref":"referencetoINFRACDKK8SClusterTest9112B301ClusterSecurityGroupId"}],"SubnetIds":[{"Ref":"referencetoINFRACDKK8SClusterTestUserSuppliedVpcPrivateSubnet1SubnetF8460491Ref"},{"Ref":"referencetoINFRACDKK8SClusterTestUserSuppliedVpcPrivateSubnet2SubnetDDB7C6A0Ref"}]}
Looks like there's another resource in the provider framework that needs to also have the vpcConfig removed to fix the issue:
// onevent handler
let provider = kubectlProvider.node.findChild('Provider');
let framework = provider.node.findChild('framework-onEvent');
let onEventResource = framework.node.findChild('Resource') as CfnFunction;
onEventResource.vpcConfig = undefined;
I can't comment on why the handler is placed inside the private side of the VPC except that I would assume that it is in fact for the purposes of security.
Please also understand that this team cannot see AWS cases.
While this workaround isn't great, I'm glad it works for you.
@indrora,
So I am glad that I was able to come up with a workaround - but I still believe the fundamental behavior is wrong. I believe this should follow the behavior of the placeClusterHandlerInVpc setting.
Hi @diranged,
Thanks for raising the issue. There are 2 lambda functions involved here:
- Cluster Handler - which uses EKS API to create the EKS Cluster
- Kubectl Handler - which is to issue
kubectlcommands against the cluster
placeClusterHandlerInVpc setting is for the Cluster Handler so it doesn't impact Kubectl Handler. Regarding the kubectl Handler vpc setup, here is the description: https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk-lib/aws-eks#kubectl-handler
Breaking this down, it means that if the endpoint exposes private access
(via EndpointAccess.PRIVATE or EndpointAccess.PUBLIC_AND_PRIVATE),
and the VPC contains private subnets, the Lambda function will be provisioned inside the VPC
and use the private subnets to interact with the cluster. This is the common use-case.
We're about to rewrite the current EKS module and will definitely consider your feedback when making decisions.
Thanks!