external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

v.0.14.0 [AWS] External-DNS cannot remove records from 2 Route 53 hosted zones (InvalidChangeBatch: [The request contains an invalid set of changes])

Open leonardocaylent opened this issue 1 year ago • 16 comments

What happened: External-DNS pod can create records but cannot delete records from 2 different hosted zones since 0.14.0. This doesn't happen on 0.13.6 What you expected to happen: External-DNS detects A & TXT records on 2 Hosted zones and can remove them without making the pod crash On version 0.14.0: level=error msg="Failure in zone internal.dev.mydomain.com. [Id: /hostedzone/<HOSTEDZONE1>] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'A .... On version 0.13.6 and earlier:

How to reproduce it (as minimally and precisely as possible): Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com) Install External-DNS 0.14.0 on EKS Create an ingress that the host is testapplication.internal.dev.yourdomain.com Wait for external-dns to detect the changes External-DNS will create the records correctly in the 2 hosted zones Remove the ingress created Wait for external-dns to detect the changes Error will show up in the external-dns pod logs:

Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº1>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com A [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº2>]
Desired change: DELETE testapplication.internal.dev.yourdomain.com TXT [Id: /hostedzone/<HostedZoneNº2>]
level=error msg="Failure in zone internal.dev.yourdomain.com. [Id: /hostedzone/<HostedZoneNº1>] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set

How to reproduce the expected/previous behaviour?: Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com) Install External-DNS 0.13.6 on EKS Create an ingress that the host is testapplication.internal.dev.yourdomain.com Wait for external-dns to detect the changes External-DNS will create the records correctly in the 2 hosted zones Remove the ingress created Wait for external-dns to detect the changes Success will show up in the external-dns pod logs:

msg="Applying provider record filter for domains: [internal.sandbox.yourdomain.com. sandbox.yourdomain.com.]"
msg="Desired change: DELETE cname-testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
msg="Desired change: DELETE cname-testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
msg="Desired change: DELETE testapplication.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

Anything else we need to know?: This is working fine for ingresses that uses only 1 hosted zone (it can be easily tested with the same ingress example using the host testapplication.dev.yourdomain.com) Environment:

  • External-DNS version (use external-dns --version): 0.14.0
  • DNS provider:
  • Others: EKS 1.26

leonardocaylent avatar Feb 07 '24 20:02 leonardocaylent

I can confirm this was introduced with https://github.com/kubernetes-sigs/external-dns/pull/3747

leonardocaylent avatar Feb 09 '24 15:02 leonardocaylent

Can confirm, reverting to 0.13.6 addresses this issue, and UPSERTS for ALIAS on TXT work as expected.

cilindrox avatar Feb 15 '24 12:02 cilindrox

@cilindrox Thank you for sharing that here. Can you confirm the use case that you are using is the same? 2 Hosted zones with similar names as (internal.dev.yourdomain.com & dev.yourdomain.com)?

leonardocaylent avatar Feb 15 '24 15:02 leonardocaylent

correct, several instances of the above ^

We deploy this with another provider and 1.4.0 seems a-ok there. It's only Route53 that seems broken so far.

cilindrox avatar Feb 15 '24 15:02 cilindrox

I tried to contact with the creator of https://github.com/kubernetes-sigs/external-dns/pull/3747 but I still didn't have any response. I think https://github.com/kubernetes-sigs/external-dns/pull/3747 needs to be rollbacked or we need a hotfix for this use case. We also tried using the prefix but that doesn't resolve the issue

leonardocaylent avatar Feb 15 '24 15:02 leonardocaylent

I can confirm that the issue is also present for us since updating to the latest build. Reverting this to pre 0.14.0 fixed the issue.

MitchIonascu avatar Feb 16 '24 15:02 MitchIonascu

I can confirm that the issue is also present for us since updating to the latest build. Reverting this to pre 0.14.0 fixed the issue.

Thank you for reporting this

leonardocaylent avatar Feb 16 '24 17:02 leonardocaylent

@leonardocaylent I can try and add a test case I just need to know what records are in play and possibly more logs to know whats happening.

  • What are the current records?
  • What are the desired records?

(if you are generating custom builds for testing, you could log plan to see what the generated plan records are before and after the Calculate. ) https://github.com/kubernetes-sigs/external-dns/blob/52460ba89cc8fbc17ceb8ed50ef4bfbf7cf3e1dc/controller/controller.go#L248

Do you see a log message like:

Domain %s contains conflicting record type candidates; discarding CNAME record

cronik avatar Feb 18 '24 22:02 cronik

@leonardocaylent I can try and add a test case I just need to know what records are in play and possibly more logs to know whats happening.

  • What are the current records?
  • What are the desired records?

(if you are generating custom builds for testing, you could log plan to see what the generated plan records are before and after the Calculate. )

https://github.com/kubernetes-sigs/external-dns/blob/52460ba89cc8fbc17ceb8ed50ef4bfbf7cf3e1dc/controller/controller.go#L248

Do you see a log message like:

Domain %s contains conflicting record type candidates; discarding CNAME record

Hi @cronik, here are the debugging logs for the 2 versions 0.13.6 and 0.14.0:

At creation (0.13.6)(Success):

level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

At removal (0.13.6)(Success):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"

At creation (0.14.0)(Success):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"

At removal (0.14.0)(Failure):

level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=error msg="Failure in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: 4******c-8ca4-4c49-bb9c-3**********4"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=error msg="Failure in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: 0******5-4b9f-48b3-ac49-2**********c"
level=fatal msg="failed to submit all changes for the following zones: [/hostedzone/HostedZoneNº1 /hostedzone/HostedZoneNº2]"

leonardocaylent avatar Feb 19 '24 16:02 leonardocaylent

More information about the Delete Requests: Success on DELETE RECORDS (0.13.6) 2 ChangeResourceRecordSet calls to AWS:

    "requestParameters": {
        "hostedZoneId": "Z*******************S",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1FL5HABSF5",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },
    "requestParameters": {
        "hostedZoneId": "Z*******************J",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },

Failure on DELETE RECORDS (0.14.0) 1 ChangeResourceRecordSet call to AWS:

 "errorMessage": "[The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']",
    "requestParameters": {
        "hostedZoneId": "Z*******************S",
        "changeBatch": {
            "changes": [
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "cname-testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "A",
                        "aliasTarget": {
                            "hostedZoneId": "Z1H1*********",
                            "dNSName": "k8s-sandboxtoolsingre-e********8-6*******2.us-west-2.elb.amazonaws.com",
                            "evaluateTargetHealth": true
                        }
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                },
                {
                    "action": "DELETE",
                    "resourceRecordSet": {
                        "name": "testdeploy.internal.sandbox.yourdomain.com",
                        "type": "TXT",
                        "tTL": 300,
                        "resourceRecords": [
                            {
                                "value": "\"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048\""
                            }
                        ]
                    }
                }
            ]
        }
    },

Seems like version 0.14.0 is grouping the 6 DELETES in the 2 batchs where they should be 3 DELETES per batch (3 records per hosted zone)

leonardocaylent avatar Feb 19 '24 17:02 leonardocaylent

Ack, I've seen this and will try to reproduce it and see if we can ship a fix. I was planning a release of the next version, I will consider this a showstopper if I manage to reproduce it. Will keep you posted, probably next week.

Raffo avatar Feb 21 '24 19:02 Raffo

Thank you! Here is the yaml file for quick-testing the issue:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 5
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: public.ecr.aws/l6m2t8p7/docker-2048
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: default
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:12345678:certificate/blablablablabla
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.name: dev-tools-ingress
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'

spec:
  ingressClassName: alb
  rules:
    - host: testdeploy.internal.dev.yourdomain.com
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80

leonardocaylent avatar Feb 21 '24 19:02 leonardocaylent

@leonardocaylent I am not sure I understand what the desired behavior should be. I haven't worked with overlapping zones, so I may be confused on what you actually desire that it would happen. Can you make an example with what the behavior of overlapping zone was, is today and what you expect it to be? I would personally assume that we don't double write records to zones that overlap.

Raffo avatar Feb 24 '24 09:02 Raffo

@Raffo this is behavior of overlapping zones on all versions of external-dns:

The hostname for the ingress is: https://testdeploy.internal.sandbox.yourdomain.com/

The Route 53 Hosted Zones:

internal.sandbox.yourdomain.com (Type Private hosted zone)
sandbox.yourdomain.com (Type Public hosted zone)

The 3 Records on Hosted Zone internal.sandbox.yourdomain.com:

Type A: testdeploy.internal.sandbox.yourdomain.com (to k8s-sandboxtoolsingre-e8f5f***.elb)
Type TXT: testdeploy.internal.sandbox.yourdomain.com
Type TXT: cname-testdeploy.internal.sandbox.yourdomain.com
"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048"

The 3 Records on Hosted Zone sandbox.yourdomain.com:

Type A: testdeploy.internal.sandbox.yourdomain.com (to k8s-sandboxtoolsingre-e8f5f***.elb)
Type TXT: testdeploy.internal.sandbox.yourdomain.com
Type TXT: cname-testdeploy.internal.sandbox.yourdomain.com
"heritage=external-dns,external-dns/owner=us-west-2:sandbox.yourdomain.com,external-dns/resource=ingress/default/ingress-2048"

Both records are identical on the 2 different hosted zones. The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected, the issue that started in 0.14.0 is that external-dns is not able to delete the records. If we would have another private or public zone that is called yourdomain.com, we probably would have another 3 records in that hosted zone also. It would be great if external-dns could know that we only want to create the records in internal.sandbox.yourdomain.com private hosted zone, but I believe for retro-compatibility and other users use cases, they may need to keep the behavior just as it is right now, fixing the grouping of the DELETEs on the ChangeResourceRecordSet api call

leonardocaylent avatar Feb 26 '24 12:02 leonardocaylent

@Raffo @cronik I have 2 important updates about this issue:

1)Found the culprit of this issue: FilterEndpointsByOwnerID is generating 2 duplicate records for changes.Delete = endpoint.FilterEndpointsByOwnerID(p.OwnerID, changes.Delete) on plan.go

We could fix that doing something like this:

func FilterEndpointsByOwnerID(ownerID string, eps []*Endpoint) []*Endpoint {
	filtered := []*Endpoint{}
	visited := make(map[EndpointKey]bool) // Initialize the visited map

	for _, ep := range eps {
		key := EndpointKey{DNSName: ep.DNSName, RecordType: ep.RecordType, SetIdentifier: ep.SetIdentifier}
		if visited[key] { //Do not contain duplicated endpoints
			log.Debugf(`Already loaded endpoint %v `, ep)
			continue 
		}
		if endpointOwner, ok := ep.Labels[OwnerLabelKey]; !ok || endpointOwner != ownerID {
			log.Debugf(`Skipping endpoint %v because owner id does not match, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		} else {
			filtered = append(filtered, ep)
			log.Debugf(`Added endpoint %v because owner id matches, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		}
		visited[key] = true
	}

We will also add more granular Debug logs as they were super useful to fix this issue.

  1. With the adittion of https://github.com/kubernetes-sigs/external-dns/pull/4229 now the pod doesn't crash anymore, which is great level=error msg="Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/HostedZoneNº1 /hostedzone/HostedZoneNº2]"

Waiting for thoughts/comments

leonardocaylent avatar Feb 27 '24 14:02 leonardocaylent

Behavior with the fix:

level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: sandbox.yourdomain.com.)"

level=info msg="Applying provider record filter for domains: [sandbox.yourdomain.com. .sandbox.yourdomain.com. us-west-2.sandbox.yourdomain.com. .us-west-2.sandbox.yourdomain.com. 

level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id does not match, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e****8-69***32.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id does not match, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HostedZoneNº2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HostedZoneNº1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HostedZoneNº1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HostedZoneNº1] were successfully updated"

leonardocaylent avatar Feb 27 '24 15:02 leonardocaylent

Adding more details on each file call:

On Create at version 0.14.0 with the fix:

level=debug msg="Considering zone: /hostedzone/HZ1 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"

On Delete at version 0.14.0 with the fix:

level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Start filter on plan.go"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*****8-6**2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*****8-6***2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="2- All changes.UpdateOld"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="Filter on txt.go"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="2- All changes.UpdateOld"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e***8-6**2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"

On Create at version 0.14.0 without the fix: Same behavior (no changes)

On Delete at version 0.14.0 without the fix:

level=debug msg="Start filter on plan.go"
level=debug msg="1 - All changes.Delete"
level=debug msg="Warning: Without the continue: Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
d does not match, found: \"\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="2- All changes.UpdateOld"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="Filter on txt.go"
level=debug msg="3- All changes.UpdateNew"
level=debug msg="2- All changes.UpdateOld"
level=debug msg="1 - All changes.Delete"
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Warning: Without the continue: Already loaded endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] "
level=debug msg="Added endpoint testdeploy.internal.sandbox.yourdomain.com 300 IN A  k8s-sandboxtoolsingre-e*8-6*2.us-west-2.elb.amazonaws.com [{aws/evaluate-target-health true} {alias true}] because owner id matches, found: \"us-west-2:sandbox.yourdomain.com\", required: \"us-west-2:sandbox.yourdomain.com\""
level=debug msg="Refreshing zones list cache"
level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=error msg="Failure in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=error msg="Failure in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=error msg="Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/HZ2 /hostedzone/HZ1]"

It's creating two times the Route53 record so that should maybe also grouped by or fixed in another ticket

leonardocaylent avatar Mar 01 '24 00:03 leonardocaylent

@leonardocaylent please open a PR with the proposed fix. I would love to understand what is the impact of this change and it's hard to reason about it without a proposed code change.

Raffo avatar Mar 02 '24 10:03 Raffo

@Raffo I'll open the pr. The changes would add the "Group by endpoint" that were needed on https://github.com/kubernetes-sigs/external-dns/pull/3747 : 1)Validate creating duplicated endpoints per Hosted Zone 2)Validate removing duplicated endpoints per Hosted Zone 3)Adding tests if possible

leonardocaylent avatar Mar 02 '24 18:03 leonardocaylent

@Raffo https://github.com/kubernetes-sigs/external-dns/pull/4296 is ready to review. I needed to add the RecordType as part of the key because without this some necessary records were being skipped

leonardocaylent avatar Mar 03 '24 22:03 leonardocaylent

The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected

I'm not sure to agree with that :thinking: . The DNS is a reference system, it's designed to be a source of truth. I don't see how it can be a source of truth with duplicated records. It doesn't work like that on TLD. It should also not work like that on this domain level.

=> IMHO, the expected behavior should be to create and delete records only in internal.sandbox.yourdomain.com sub zone.

mloiseleur avatar Mar 21 '24 12:03 mloiseleur

@leonardocaylent Am I wrong to think there is an easy workaround ?

I mean : if you run two different instances of external-dns, one per overlapping zone, then it may behave as (you) expect.

mloiseleur avatar Mar 21 '24 12:03 mloiseleur

The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected

I'm not sure to agree with that 🤔 . The DNS is a reference system, it's designed to be a source of truth. I don't see how it can be a source of truth with duplicated records. It doesn't work like that on TLD. It should also not work like that on this domain level.

=> IMHO, the expected behavior should be to create and delete records only in internal.sandbox.yourdomain.com sub zone.

@mloiseleur Maybe there is a confusion about what is "expected" and how external-dns was behaving with all the previous versions. The bug was reported since external-dns lost the ability of deleting Route53 records on multiple hosted zones with the same name, which wouldn't be needed if it's only created on the correct/best matching hosted zone(which is a feature that I guess is not on external-dns yet). For example: application.internal.sandbox.yourdomain.com is expecting to be only on the Private Route53 hosted zone, and having the record also in sandbox.yourdomain.com is a consecuence of having a matching result for a hosted zone finishing with the same name. A possible solution would be to only insert/manage the records on the best-matching candidate, but that would need a full regression in order to be applied on current external-dns versions, since when this new feature is applied that would cause some of the old records to be ignored.

leonardocaylent avatar Mar 26 '24 15:03 leonardocaylent

@leonardocaylent Am I wrong to think there is an easy workaround ?

I mean : if you run two different instances of external-dns, one per overlapping zone, then it may behave as (you) expect.

@mloiseleur I considered doing something like that but it would be a huge impact for people that has more than 1 overlapping hosted zone, or more than 5 eks clusters. It would dramatically increase the number of pods or IaC code to mantain and they'd need to have different filters on each deployment. A possible solution is to "Feature Flag" the FilterEndpointsByOwnerId function and keep that as an optional between the previous behavior and the new one. What do you think about that?

leonardocaylent avatar Mar 26 '24 15:03 leonardocaylent

@mloiseleur Small update: there is a new commit on https://github.com/kubernetes-sigs/external-dns/pull/4296 that is a good candidate to solve the issue without using feature flags

leonardocaylent avatar Mar 29 '24 07:03 leonardocaylent