aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

(aws eks): EKS stack deletes resources in the wrong order, causing DELETE_FAILED

Open mvs5465 opened this issue 3 years ago • 8 comments

What is the problem?

Running cdk destroy on an EKS cluster stack always results in DELETE_FAILED.

It appears to be attempting to delete the security group before deleting the cluster, causing a failure to delete as the resource is in use.

This error returns from the control plane security group deletion in cloudformation:

resource <security group id> has a dependent object (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: <request id>; Proxy: null)

The cloudformation stack itself then fails to delete with an error like this:

The following resource(s) failed to delete: [<control plane security group name, <eks fargate profile name>].

Reproduction Steps

Define a new cluster:

new aws_eks.FargateCluster(this, id, {
        version: this.props.version,
        vpc: this.props.vpc,
        endpointAccess: EndpointAccess.PRIVATE,
        placeClusterHandlerInVpc: true,
        vpcSubnets: [{
          subnetType: SubnetType.PRIVATE_WITH_NAT
        }]
    });

then run cdk deploy. After it succeeds, run cdk destroy and the error will happen.

What did you expect to happen?

Handler should delete the EKS cluster first, and then delete the security group.

What actually happened?

Handler deletes the security group first, which fails because the resource is in use. It then causes rollback failed and/or delete failed.

CDK CLI Version

2.8.0 (build 8a5eb49)

Framework Version

No response

Node.js Version

v17.3.1

OS

MacOS Catalina 10.15.7

Language

Typescript

Language Version

No response

Other information

No response

mvs5465 avatar Jan 25 '22 19:01 mvs5465

This is for a fargate cluster, it seems like the fargate profile needs to be deleted before the cluster and maybe that is what is causing this.

mvs5465 avatar Jan 25 '22 19:01 mvs5465

Thanks for reporting this @mvs5465,

I'm pretty sure I've ran into this before as well, but I'm not familiar if we're able to do anything about this or if this is in CloudFormation's control to fix. I don't know of any ways to customize the deletion of a CloudFormation stack, and I'm not sure how stack destruction works under the hood. @otaviomacedo do you know anything about this issue?

peterwoodworth avatar Jan 28 '22 22:01 peterwoodworth

@peterwoodworth Thanks, for what it's worth this stopped happening and I haven't quite figured out how to replicate it. We've since added a bunch more config in our Fargate cluster creation, I think it may have stopped when we set the mastersRole property.

mvs5465 avatar Feb 01 '22 19:02 mvs5465

we are hitting the same unfortunately, all the Manifests/HelmCharts fail to be removed after 3x15mins timeouts between the provider and onEvent handler kubectl CustomResources because the SG rules were destroyed

adriantaut avatar Jul 13 '23 10:07 adriantaut

Hi, any updates on this? is there any workaround? I'm facing a similar issue with eks manifests, all of them fail to delete. I think because the lambda handler gets deleted before deleting the manifests.

fsellecchia avatar Mar 16 '24 12:03 fsellecchia

Hey, this is happening to me constantly on deletion. Please look into this, makes working with EKS impossible with CDK

mirodrr2 avatar Jan 31 '25 02:01 mirodrr2

@mirodrr2 @fsellecchia @adriantaut

Is this happening only when you specify subnetType: SubnetType.PRIVATE_WITH_NAT for vpcSubnets?

new aws_eks.FargateCluster(this, id, {
        version: this.props.version,
        vpc: this.props.vpc,
        endpointAccess: EndpointAccess.PRIVATE,
        placeClusterHandlerInVpc: true,
        vpcSubnets: [{
          subnetType: SubnetType.PRIVATE_WITH_NAT
        }]
    });

This is a known limitation when working with Lambda functions configured to run in a VPC. The issue occurs specifically with the Elastic Network Interfaces (ENIs) that Lambda creates to connect to your VPC. [1]

Here's what happens:

When you create a Lambda function with VPC access, AWS automatically creates ENIs in your VPC

When you try to delete the CloudFormation stack, it attempts to delete all resources including security groups [2]

The deletion fails because the ENIs are still attached to the security groups, and Lambda needs some time to clean up these ENIs

Unfortunately this is a limit on CloudFormation that can't get the resource immediately deleted. You won't be able to delete that using AWS CLI as well. I'm not sure if CDK can do anything in this use case.

pahud avatar Feb 05 '25 18:02 pahud

Don't worry about it. I will wait for the new EKS CDK version that I think will have fewer issues

mirodrr2 avatar Feb 05 '25 18:02 mirodrr2