cdk-eks-blueprints icon indicating copy to clipboard operation
cdk-eks-blueprints copied to clipboard

els-blueprints: When destroying the whole network resources (VPC, Subnet, Routetable, NACL, SG) are left

Open jesperalmstrom opened this issue 1 year ago • 6 comments

Describe the bug

When calling cdk destroy all or most of the networking resourcing was left un-destroyed.

Expected Behavior

All the resources where destroyed

Current Behavior

Resources like (VPC, Subnet, Routetable, NACL, IGW, NetworkInterfaces, SG). I had to find a Gist that I found to identify and then destroy them manually.

Reproduction Steps

Added a list of Addons

       // AddOns for the cluster.
        const addOns: Array<blueprints.ClusterAddOn> = [
            new blueprints.addons.FluxCDAddOn,
            new blueprints.addons.SSMAgentAddOn,
            new blueprints.addons.ClusterAutoScalerAddOn,
            new blueprints.addons.AwsLoadBalancerControllerAddOn(),
            //new blueprints.addons.VpcCniAddOn(),
            new blueprints.addons.CertManagerAddOn(),
            new blueprints.addons.ExternalDnsAddOn({
                hostedZoneResources: [blueprints.GlobalResources.HostedZone]
            }),
            new blueprints.addons.EfsCsiDriverAddOn({kmsKeys: [kmsKey]}), 
            new blueprints.addons.EbsCsiDriverAddOn(),
            new blueprints.addons.IngressNginxAddOn()
        ];

Then created the cluster:

        const stack = blueprints.EksBlueprint.builder()
            .version('auto')
            .account(account)
            .region(region)
            .clusterProvider(clusterProvider)
            .resourceProvider(blueprints.GlobalResources.Vpc, new blueprints.VpcProvider(undefined, { primaryCidr: envContext.vpcCidr }))
            .resourceProvider(blueprints.GlobalResources.HostedZone, new blueprints.ImportHostedZoneProvider(r53HostedZone.hostedZoneId, hostedZoneName))
            .resourceProvider(blueprints.GlobalResources.KmsKey, new blueprints.CreateKmsKeyProvider())
            .resourceProvider("s3-bucket", new blueprints.CreateS3BucketProvider({
                name: envContext.s3BucketName+'.'+account+'.'+region,
                id: envContext.s3BucketName,
                s3BucketProps: { removalPolicy: RemovalPolicy.DESTROY },
            }))
            .addOns(...addOns)
            .build(this, 'my-eks-blueprint');

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.147.3 (build 32f0fdb)

EKS Blueprints Version

1.15.1

Node.js Version

v22.2.0

Environment details (OS name and version, etc.)

sw_vers ProductName: macOS ProductVersion: 14.5 BuildVersion: 23F79

Other information

No response

jesperalmstrom avatar Aug 26 '24 05:08 jesperalmstrom

@jesperalmstrom what is the status of the stack in cloudformation after you destroy it? Sometimes destroy command won't finish leaving resources behind, that happens for example if any of the resources are modified outside of the stack. In that case the CFN detects drift and stops.

shapirov103 avatar Aug 27 '24 14:08 shapirov103

I did a faulty deploy (wrong region) so i destroyed almost immediately. There should not have been any drift.

jesperalmstrom avatar Aug 27 '24 15:08 jesperalmstrom

The CFN stack got several of these

This resource failed to delete. It was skipped and retained using the Force Delete Stack mode.

When I tried force delete it would not succeed because of dependencies. Took me hours to find and understand all the dependencies. I found a script that I modified to be able to delete them (slight modified version of this https://gist.github.com/alberto-morales/b6d7719763f483185db27289d51f8ec5).

jesperalmstrom avatar Aug 27 '24 15:08 jesperalmstrom

@shapirov103 do you have any ideas or tricks?

jesperalmstrom avatar Sep 01 '24 21:09 jesperalmstrom

This is not the expected behavior, however, before qualifying it as a defect, please share what were the dependencies that you discovered? For example, I see that you used Flux. Flux can in turn provision apps that will fail to be removed if flux controller is destroyed, this is especially true for any CRDs from flux which then fail to be cleaned up because no controller is available.

CFN is expected to remove all resources that were provisioned, provided there was no change to the resources.

shapirov103 avatar Sep 05 '24 15:09 shapirov103

Thanks for the response I will try to remove Flux and see if the delete becomes more smooth.

jesperalmstrom avatar Sep 11 '24 20:09 jesperalmstrom

It's mainly VPC network related resources that is left behind when I tried to remove the stack.

jesperalmstrom avatar Mar 07 '25 09:03 jesperalmstrom

I have a similar situation. I created a cluster with the 1.17.2 blueprints. After the stack was created and the cluster was stable,

Nothing was added to the cluster after the cdl stack was deployed.

I decided to delete the cloud formation stack. After an hour, it failed with this error: CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [f397410c-726d-4886-9a67-00c57eda7759]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

We've seen this multiple times. I wonder if there is a recommended approach for destroying EKS Blueprint CDK stacks.

vpopiolrccl avatar Sep 24 '25 18:09 vpopiolrccl

@vpopiolrccl are there any resources that are automatically created outside of the CDK stack? For example, in order to deal with PVRE some customers automatically add permissions to the cluster roles based on a trigger (e.g. EKS cluster is created). In this case CDK and CFN fails when it detects any resource modification outside of the stack. Another example could be provisioning of ingress or loadbalancer in EKS (e.g. using gitops) and then failure to clean up. That may hold VPC resources as it has network interfaces in the subnets. We test routinely that our stacks are created and destroyed successfully, so trying to identify the difference here.

@zjaco13 ^^

shapirov103 avatar Sep 24 '25 19:09 shapirov103