aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

aws_route53: delete_existing is creating and deleting 2x

Open BwL1289 opened this issue 8 months ago • 18 comments

Describe the bug

It's possible there's some user error in here.

When delete_existing=True, and something goes awry from another resource related to the A record (see below for reproduction steps), I am seeing the custom resource run twice so the a record gets created, then deleted, then created, and then finally deleted so by the time the stack finishes updating, there's no A record at all.

Here are the logs:


INIT_START Runtime Version: nodejs:20.v57	Runtime Version ARN: arn:aws:lambda:<redacted>>::runtime:9d084cce5cc7578c503eb8fe4bf7891c94c8f5f0ccb036f3f8c3a01cf5212db6
START RequestId: 2c14be6e-b073-4b8c-ac2b-aae1e0e2cd41 Version: $LATEST
2025-04-23T13:17:12.072Z	2c14be6e-b073-4b8c-ac2b-aae1e0e2cd41	INFO	
{
    "RequestType": "Create",
    "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb",
    "RequestId": "49412ef1-b119-4959-a442-db1326a97e04",
    "LogicalResourceId": "WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425",
    "ResourceType": "Custom::DeleteExistingRecordSet",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
        "RecordName": "api.dev.<redacted>.com.",
        "RecordType": "A",
        "HostedZoneId": "<redacted>"
    }
}

2025-04-23T13:17:16.254Z	2c14be6e-b073-4b8c-ac2b-aae1e0e2cd41	INFO	submit response to cloudformation https://cloudformation-custom-resource-response-<redacted>.s3.<redacted>>.amazonaws.com//arn%3Aaws%3Acloudformation%3A<redacted>>%3A<redacted>%3Astack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb%7CWebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425%7C49412ef1-b119-4959-a442-db1326a97e04?*** {
  Status: 'SUCCESS',
  Reason: 'SUCCESS',
  StackId: 'arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb',
  RequestId: '49412ef1-b119-4959-a442-db1326a97e04',
  PhysicalResourceId: '49412ef1-b119-4959-a442-db1326a97e04',
  LogicalResourceId: 'WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425',
  NoEcho: undefined,
  Data: undefined
}
END RequestId: 2c14be6e-b073-4b8c-ac2b-aae1e0e2cd41
REPORT RequestId: 2c14be6e-b073-4b8c-ac2b-aae1e0e2cd41	Duration: 4390.45 ms	Billed Duration: 4391 ms	Memory Size: 128 MB	Max Memory Used: 88 MB	Init Duration: 160.14 ms	
START RequestId: 00d5ccf0-8492-45c6-aef7-8841946360dd Version: $LATEST
2025-04-23T13:17:23.497Z	00d5ccf0-8492-45c6-aef7-8841946360dd	INFO	
{
    "RequestType": "Delete",
    "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb",
    "RequestId": "c8e33677-92bd-4fe3-bff6-379b2d73532e",
    "LogicalResourceId": "WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170",
    "PhysicalResourceId": "7d099ac6-c8b9-475b-bc4a-84680622fac6",
    "ResourceType": "Custom::DeleteExistingRecordSet",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
        "RecordName": "api.dev.<redacted>.com.",
        "RecordType": "A",
        "HostedZoneId": "<redacted>"
    }
}

2025-04-23T13:17:23.498Z	00d5ccf0-8492-45c6-aef7-8841946360dd	INFO	submit response to cloudformation https://cloudformation-custom-resource-response-<redacted>.s3.<redacted>>.amazonaws.com//arn%3Aaws%3Acloudformation%3A<redacted>>%3A<redacted>%3Astack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb%7CWebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170%7Cc8e33677-92bd-4fe3-bff6-379b2d73532e?*** {
  Status: 'SUCCESS',
  Reason: 'SUCCESS',
  StackId: 'arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb',
  RequestId: 'c8e33677-92bd-4fe3-bff6-379b2d73532e',
  PhysicalResourceId: '7d099ac6-c8b9-475b-bc4a-84680622fac6',
  LogicalResourceId: 'WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170',
  NoEcho: undefined,
  Data: undefined
}
END RequestId: 00d5ccf0-8492-45c6-aef7-8841946360dd
REPORT RequestId: 00d5ccf0-8492-45c6-aef7-8841946360dd	Duration: 120.10 ms	Billed Duration: 121 ms	Memory Size: 128 MB	Max Memory Used: 88 MB	
START RequestId: 882d23c0-226a-4461-881f-5306de3602f8 Version: $LATEST
2025-04-23T13:22:42.452Z	882d23c0-226a-4461-881f-5306de3602f8	INFO	
{
    "RequestType": "Create",
    "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb",
    "RequestId": "5ee90893-a5c8-40b0-a989-90c0bb925043",
    "LogicalResourceId": "WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170",
    "ResourceType": "Custom::DeleteExistingRecordSet",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
        "RecordName": "api.dev.<redacted>.com.",
        "RecordType": "A",
        "HostedZoneId": "<redacted>"
    }
}

2025-04-23T13:22:42.772Z	882d23c0-226a-4461-881f-5306de3602f8	INFO	submit response to cloudformation https://cloudformation-custom-resource-response-<redacted>.s3.<redacted>>.amazonaws.com//arn%3Aaws%3Acloudformation%3A<redacted>>%3A<redacted>%3Astack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb%7CWebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170%7C5ee90893-a5c8-40b0-a989-90c0bb925043?*** {
  Status: 'SUCCESS',
  Reason: 'SUCCESS',
  StackId: 'arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb',
  RequestId: '5ee90893-a5c8-40b0-a989-90c0bb925043',
  PhysicalResourceId: '5ee90893-a5c8-40b0-a989-90c0bb925043',
  LogicalResourceId: 'WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordDeleteExistingRecordSetCustomResourceF51AF170',
  NoEcho: undefined,
  Data: undefined
}
END RequestId: 882d23c0-226a-4461-881f-5306de3602f8
REPORT RequestId: 882d23c0-226a-4461-881f-5306de3602f8	Duration: 620.24 ms	Billed Duration: 621 ms	Memory Size: 128 MB	Max Memory Used: 88 MB	
START RequestId: 43437399-5250-4621-adf4-6845830879a4 Version: $LATEST
2025-04-23T13:23:53.667Z	43437399-5250-4621-adf4-6845830879a4	INFO	
{
    "RequestType": "Delete",
    "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb",
    "RequestId": "1a791d51-f495-445e-a66b-4b1a934e05dd",
    "LogicalResourceId": "WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425",
    "PhysicalResourceId": "49412ef1-b119-4959-a442-db1326a97e04",
    "ResourceType": "Custom::DeleteExistingRecordSet",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:<redacted>>:<redacted>:function:CoreStackDev-CustomDeleteExistingRecordSetCustomRe-8ZwMf05clmhl",
        "RecordName": "api.dev.<redacted>.com.",
        "RecordType": "A",
        "HostedZoneId": "<redacted>"
    }
}

2025-04-23T13:23:53.731Z	43437399-5250-4621-adf4-6845830879a4	INFO	submit response to cloudformation https://cloudformation-custom-resource-response-<redacted>.s3.<redacted>>.amazonaws.com//arn%3Aaws%3Acloudformation%3A<redacted>>%3A<redacted>%3Astack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb%7CWebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425%7C1a791d51-f495-445e-a66b-4b1a934e05dd?*** {
  Status: 'SUCCESS',
  Reason: 'SUCCESS',
  StackId: 'arn:aws:cloudformation:<redacted>>:<redacted>:stack/CoreStackDev/6fdcbfc0-fde9-11ef-a37b-06a0be8e5ebb',
  RequestId: '1a791d51-f495-445e-a66b-4b1a934e05dd',
  PhysicalResourceId: '49412ef1-b119-4959-a442-db1326a97e04',
  LogicalResourceId: 'WebAppSvcFlaskAppServiceDataPlaneRoute53SvcAlbEndpointRecordTestDeleteExistingRecordSetCustomResourceA9FD1425',
  NoEcho: undefined,
  Data: undefined
}

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

The existing a record gets deleted and then created with the new value.

Current Behavior

By the time the stack finishes updating, there's no A record at all.

Reproduction Steps

  1. Deploy with: • An A record with delete_existing=True • An ALB w/ protocol=lb.ApplicationProtocol.HTTP
  2. Attempt to move ALB to protocol=lb.ApplicationProtocol.HTTPS which may require domain_name and domain_zone to be set. In my case there was a mistake in here somewhere, and the domain_name got set to some other value (i.e. original value was api.dev.example.com an the new domain_name gets set to just dev.example.com), the A record was deleted and recreated with the new record name.
  3. Try to force Cfn to recreate the record set by changing the logicalid of the A record
  4. Experience error

Possible Solution

Detect if the a record has already been deleted and (re)created by the custom resource and if so, don't run the CR again.

Additional Information/Context

No response

CDK CLI Version

2.1000.3 (build 321a46a)

Framework Version

No response

Node.js Version

v22.12.0

OS

Mac

Language

Python

Language Version

No response

Other information

No response

BwL1289 avatar Apr 23 '25 17:04 BwL1289

appears to be related to https://github.com/aws/aws-cdk/issues/26754

ykethan avatar Apr 23 '25 22:04 ykethan

Hey @BwL1289, thank you for reporting this issue and providing us the logs. On diving into this issue, I was able to observe the following when reproducing this issue.

CDK versions

 "aws-cdk-lib": "2.191.0",
"aws-cdk": "^2.1012.0"

Reproduction Steps:

  1. Created a CDK stack with a Route53 PublicHostedZone and an ARecord:
export class Route53IssueStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Step 1: Create a VPC
    const vpc = new ec2.Vpc(this, 'VPC', {
      maxAzs: 2,
      natGateways: 0,
    });

    // Step 1: Create an ALB
    const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
    });

    // Create a hosted zone
    const hostedZone = new route53.PublicHostedZone(this, 'HostedZone', {
      zoneName: 'mydemo-domain.net',
    });

    // Create an A record
    new route53.ARecord(this, 'ARecord', {
      zone: hostedZone,
      recordName: 'api.dev.mydemo-domain.net',
      target: route53.RecordTarget.fromAlias(
        new targets.LoadBalancerTarget(alb)
      ),
      deleteExisting: true,
    });

    // Add HTTP listener
    alb.addListener('HttpListener', {
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTP,
      defaultAction: elbv2.ListenerAction.fixedResponse(200, {
        contentType: 'text/plain',
        messageBody: 'OK',
      }),
    });
  }
}
  1. ran cdk synth && cdk deploy, the deploy was succesfull
  2. Updated zoneName and recordName:
    // Create a hosted zone
    const hostedZone = new route53.PublicHostedZone(this, 'HostedZone', {
      zoneName: 'mydemo-domain-abc.net',
    });

    // Create an A record
    new route53.ARecord(this, 'ARecord', {
      zone: hostedZone,
      recordName: 'api.dev.mydemo-domain-abc.net',
      target: route53.RecordTarget.fromAlias(
        new targets.LoadBalancerTarget(alb)
      ),
      deleteExisting: true,
    });
  1. ran cdk synth
HostedZoneDB99F866:
  Type: AWS::Route53::HostedZone
  Properties:
    Name: mydemo-domain.net.  # Should be mydemo-domain-abc.net.
  1. Tried clearing the CDK context with cdk context --clear and re-synthesizing, the issue persists.
  2. On deploy observed duplicate record events in the logs

Marking this as P1

ykethan avatar Apr 24 '25 02:04 ykethan

@ykethan thank you

BwL1289 avatar Apr 24 '25 03:04 BwL1289

Hey @BwL1289, wanted to follow up on this. Tried reproing this issue with code from the previous comment and the following but no longer observe the duplicate operations. Could you let us know are still experiencing this and provide us a minimal reproduction code?

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as route53 from 'aws-cdk-lib/aws-route53';
import * as targets from 'aws-cdk-lib/aws-route53-targets';
import * as customResources from 'aws-cdk-lib/custom-resources';

export class Route53IssueStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Step 1: Create a VPC
    const vpc = new ec2.Vpc(this, 'VPC', {
      maxAzs: 2,
      natGateways: 0,
    });

    // Step 1: Create an ALB
    const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
    });

    // Create a hosted zone
    const hostedZone = new route53.PublicHostedZone(this, 'HostedZone', {
      zoneName: 'mydemo-domain-dev.net',
    });

    // Create an A record
    new route53.ARecord(this, 'ARecord1', {
      zone: hostedZone,
      recordName: 'api.dev.mydemo-domain.net',
      target: route53.RecordTarget.fromAlias(
        new targets.LoadBalancerTarget(alb),
      ),
      deleteExisting: true,
    });

    // Add HTTP listener
    alb.addListener('HttpListener', {
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTPS,
      certificates: [
        elbv2.ListenerCertificate.fromArn(
          <Arn>,
        ),
      ],
      defaultAction: elbv2.ListenerAction.fixedResponse(200, {
        contentType: 'text/plain',
        messageBody: 'OK',
      }),
    });
  }
}

ykethan avatar May 07 '25 20:05 ykethan

@ykethan i can't try reproducing it again as I'm blocked due to #34290 I opened (which is in turn related upstream to #34235).

BwL1289 avatar May 07 '25 21:05 BwL1289

@ykethan this is still a bug and I just experienced it. This should be prioritized because 1) it just took down production for us 2) I am now worried about deploying to production again 3) I had to actually patch this in the console.

BwL1289 avatar May 09 '25 22:05 BwL1289

Hey @BwL1289, sorry to hear about this. Would you be open to connect on a call to dive into this? as i have not been able to reproduce the issue. I am available on discord, my handle is ykethan

ykethan avatar May 09 '25 22:05 ykethan

@ykethan sure. I just sent you a friend request.

BwL1289 avatar May 09 '25 22:05 BwL1289

@BwL1289 thank you for hopping on the call and walking through the issue. I was able to reproduce the issue, marking as P1.

Reproduction steps:

  1. deploy the following:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as route53 from 'aws-cdk-lib/aws-route53';
import * as targets from 'aws-cdk-lib/aws-route53-targets';
import * as customResources from 'aws-cdk-lib/custom-resources';

export class Route53IssueStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Step 1: Create a VPC
    const vpc = new ec2.Vpc(this, 'VPC', {
      maxAzs: 2,
      natGateways: 0,
    });

    // Step 1: Create an ALB
    const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
    });

    // Create a hosted zone
    const hostedZone = new route53.PublicHostedZone(this, 'HostedZone', {
      zoneName: 'mydemo-domain-dev.net',
    });

    // Create an A record
    new route53.ARecord(this, 'ARecord1', {
      zone: hostedZone,
      recordName: 'api.dev.mydemo-domain.net',
      target: route53.RecordTarget.fromAlias(
        new targets.LoadBalancerTarget(alb),
      ),
      deleteExisting: true,
    });

    // Add HTTP listener
    alb.addListener('HttpListener', {
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTPS,
      certificates: [
        elbv2.ListenerCertificate.fromArn(
          <Arn>,
        ),
      ],
      defaultAction: elbv2.ListenerAction.fixedResponse(200, {
        contentType: 'text/plain',
        messageBody: 'OK',
      }),
    });
  }
}

  1. modify the ARecord ID and deploy
 new route53.ARecord(this, 'newRecord', {
      zone: hostedZone,
      recordName: 'api.dev.mydemo-domain.net',
      target: route53.RecordTarget.fromAlias(
        new targets.LoadBalancerTarget(alb),
      ),
      deleteExisting: true,
    });
  1. On route53 console, observe the A record is removed. This appears to be due to the custom resource not re-adding the record after removing the previous record on a logicalID change.

ykethan avatar May 09 '25 23:05 ykethan

No problem. Thanks.

BwL1289 avatar May 10 '25 13:05 BwL1289

The issue comes from using the deleteExisting property together with a change in the logical ID. The deleteExisting option is meant only for migration cases—when a record already exists but isn’t managed by CloudFormation. It was added to help delete existing records during the first deployment to avoid manual cleanup and minimize the downtime.

I understand how this can be confusing, and I agree it could be clearer. I will discuss this with my team to see how we can improve it for customers.

However, this property is not designed to support changing the logical ID within the same stack. If you remove deleteExisting and try to change the logical ID, the deployment will fail as expected. This is because Route 53 requires that record sets be unique by both name and type within a hosted zone. You would see a similar failure if you changed the logical ID of an S3 bucket with a fixed bucketName, for example.

The commit message that introduced the deleteExisting property contains a full explanation.

If my explanation isn’t clear or doesn’t fully address your use case, please let me know.

gasolima avatar May 22 '25 07:05 gasolima

@gasolima thanks. It explains the issue but I'm not clear on what the solution is?

I think the custom resource for delete_existing should be more robust.

  1. It should only be triggered if the record name or value has changed.
  2. It should accommodate use cases where the logicalId changes

BwL1289 avatar May 30 '25 14:05 BwL1289

It should accommodate use cases where the logicalId changes

This issue is related to how CloudFormation works. It is expected that changing the logicalId should not work, similar to an S3 resource with a fixed bucketName. Changing the logicalId with the same recordName means creating an A record with the same name and type, which Route53 does not allow. The preferred approach would be to fail the deployment instead of the current behaviour(removing the A record).

Can you provide more details about your use case and the purpose of changing the logical Id? That would help me better understand the context and provide more relevant suggestions.

gasolima avatar Jun 11 '25 11:06 gasolima

This issue is related to how CloudFormation works. It is expected that changing the logicalId should not work, similar to an S3 resource with a fixed bucketName.

I disagree. S3 buckets are stateful which is why changing the logicalId for S3 buckets is dangerous. In contrast, route53 records are not. Instead, Route53 records are named resources in cloudformation, which is different, and why introducing delete_existing was done originally. Because it's a custom resource, we should have full control over the lifecycle of the resource, including checking by the end of deployment whether or not the record exists.

Can you provide more details about your use case and the purpose of changing the logical Id? That would help me better ?understand the context and provide more relevant suggestions.

LogicalIds can change any time for any reason, including normal refactoring, so use cases are sort of irrelevant.

What's more concerning is that I actually locked the logicalid for this A record, so I don't even know how this happened in the first place:

def _create_r53_a_record(self) -> None:
    self._r53_a_record = r53.ARecord(
        self,
        # WARNING: DO NOT CHANGE
        "AlbEndpointRecord",  # WARNING: DO NOT CHANGE
        # WARNING: DO NOT CHANGE
        zone=self._hosted_zone,
        target=r53.RecordTarget.from_alias(
            alias_target=targets.LoadBalancerTarget(
            self.alb_svc.alb,
           )
        ),
        delete_existing=True,
        record_name=self._hosted_zone_record_name_fqdn,
        comment="Application Load Balancer Endpoint A Record",
    )

def _override_logical_ids(self) -> None:
    # WARNING: DO NOT CHANGE
    override_logical_id(self._r53_a_record, "AlbEndpointRecord")  # WARNING: DO NOT CHANGE
    # WARNING: DO NOT CHANGE

BwL1289 avatar Jun 11 '25 22:06 BwL1289

In addition to the above...delete_existing causes downtime for production applications. For example, removing the A record for an ALB will cause the service to go down temporarily while the CR recreates the record.

Maybe it would be better to use UPSERT instead of DELETE in the CR.

BwL1289 avatar Jun 11 '25 22:06 BwL1289

I disagree. S3 buckets are stateful which is why changing the logicalId for S3 buckets is dangerous. In contrast, route53 records are not.

However, this issue is not solely about whether a resource is stateful. The main reasons are:

  1. Some services (e.g., S3, Route 53, etc.) do not allow multiple resources with the same name to exist in the same account or globally.
  2. CloudFormation creates resources for the new logical IDs before removing the old ones.

This inevitably causes failures during the creation phase. For example, although Lambda functions are not stateful, they exhibit the same problem if the functionName is hardcoded.

Because it's a custom resource, we should have full control over the lifecycle of the resource, including checking by the end of deployment whether or not the record exists.

We’re currently moving away from custom resources due to various issues we've encountered. We believe this approach is not ideal going forward. That said, I’m happy to support customers in creating their custom resource that lives within their own stack and is tailored to their specific use case.

LogicalIds can change any time for any reason, including https://github.com/aws/aws-cdk-rfcs/issues/162#issuecomment-2067754875

In this case—whether you’re dealing with Route 53, S3, Lambda, etc.—you’ll need to wait for the upcoming refactoring feature to address such challenges.

In addition to the above...delete_existing causes downtime for production applications. For example, removing the A record for an ALB will cause the service to go down temporarily while the CR recreates the record.

Exactly—as I mentioned in my previous comment, and as noted in the documentation, this setting minimizes downtime but does not eliminate it.

Maybe it would be better to use UPSERT instead of DELETE in the CR.

This property is designed for cases where an A record already exists but is not currently owned by the stack—and you are willing to accept the associated downtime. Using DELETE allows the stack to take ownership of the record. If we used UPSERT instead, it would prevent that ownership transition from happening.

gasolima avatar Jun 12 '25 17:06 gasolima

I disagree. S3 buckets are stateful which is why changing the logicalId for S3 buckets is dangerous. In contrast, route53 records are not.

However, this issue is not solely about whether a resource is stateful. The main reasons are:

  1. Some services (e.g., S3, Route 53, etc.) do not allow multiple resources with the same name to exist in the same account or globally.
  2. CloudFormation creates resources for the new logical IDs before removing the old ones.

This inevitably causes failures during the creation phase. For example, although Lambda functions are not stateful, they exhibit the same problem if the functionName is hardcoded.

Why cherry pick my responses? I'm well aware of what the limitations are: "Instead, Route53 records are named resources in cloudformation, which is different, and why introducing delete_existing was done originally."

We’re currently moving away from custom resources due to various issues we've encountered. We believe this approach is not ideal going forward. That said, I’m happy to support customers in creating their custom resource that lives within their own stack and is tailored to their specific use case.

So this functionality will be deprecated? If so, it sounds like this conversation is moot and I should just not use this at all. I appreciate the offer to help write CRs, but we're capable of doing that, and I don't think we're unique and our use case is just an esoteric application of CDK/Cfn and Route53.

In addition to the above...delete_existing causes downtime for production applications. For example, removing the A record for an ALB will cause the service to go down temporarily while the CR recreates the record.

Exactly—as I mentioned in my previous comment, and as noted in the documentation, this setting minimizes downtime but does not eliminate it.

To me, the official documentation you're pointing to does not indicate that: "This allows to deploy a new record set while minimizing the downtime because the new record set will be created immediately after the existing one is deleted. It also avoids "manual" actions to delete existing record sets."

Sure, it doesn't say 100% of downtime will be avoided, but the way it's written indicates we'll see minimal downtime, which is not what we experience (20+ seconds typically).

WRT the comment you pointed to, is the expectation that customers should dig through 3 year old commit messages? FWIW, Jogold mentions that migrations are a typical use case, but he does not say it is the only one.

I'm not trying to be combative, but we've gotten pretty off-topic as to why I opened this ticket. If the resolution is to just not use delete_existing, that's fine.

BwL1289 avatar Jun 12 '25 19:06 BwL1289

I may not have been very clear earlier. We've now observed that many of these custom resources are problematic — they're quite fragile, and in some cases, it's nearly impossible to make them robust. As you mentioned in the comments, there are several examples that highlight these limitations. This isn't necessarily the fault of the person who originally introduced them; rather, it's part of our learning process, and mistakes are a natural part of that.

In this particular case, unfortunately, we're not planning to add more features.

but the way it's written indicates we'll see minimal downtime, which is not what we experience (20+ seconds typically).

I completely understand your concern. The documentation currently uses very broad language. I’ll work on updating it to better reflect the actual behavior, and I’ll also bring it up with the team to consider marking this field as deprecated.

gasolima avatar Jun 16 '25 10:06 gasolima

@aemada-aws thanks

BwL1289 avatar Jun 30 '25 16:06 BwL1289

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

github-actions[bot] avatar Jul 30 '25 17:07 github-actions[bot]