external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

fix: duplicated endpoint per hosted zone

Open leonardocaylent opened this issue 11 months ago • 39 comments

Description

Looking into Issue #4241 we found that Delete was trying to generate 2 calls with 6 records per hosted zone instead of 2 calls with 3 records each. Taking a deeper look, changes.Delete at plan.go was handling duplicates at 0.13.6 but on the latest version it's not filtering them anymore. Removing the duplicate endpoints in plan.go solves this issue for all providers

Short explain of new behavior Added a new function RemoveDuplicates to detect duplicated endpoints when needed Added Unit Tests for the function Removed duplicated records on plan.go for changes.Delete

Details Create behavior: Without the fix:

level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"

With the fix:

level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"
level=info msg="Desired change: CREATE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: CREATE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"

Delete behavior: Without the fix:

level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2]"
level=debug msg="Adding cname-testdeploy.internal.sandbox.yourdomain.com. to zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=error msg="Failure in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=error msg="Failure in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'A testdeploy.internal.sandbox.yourdomain.com.', The request contains an invalid set of changes for a resource record set 'TXT testdeploy.internal.sandbox.yourdomain.com.']\n\tstatus code: 400, request id: X"
level=error msg="Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/HZ2 /hostedzone/HZ1]"

With the fix:

level=debug msg="Considering zone: /hostedzone/HZ2 (domain: internal.sandbox.yourdomain.com.)"
level=debug msg="Considering zone: /hostedzone/HZ1 (domain: sandbox.yourdomain.com.)"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ1]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ1]"
level=info msg="3 record(s) in zone sandbox.yourdomain.com. [Id: /hostedzone/HZ1] were successfully updated"
level=info msg="Desired change: DELETE cname-testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com A [Id: /hostedzone/HZ2]"
level=info msg="Desired change: DELETE testdeploy.internal.sandbox.yourdomain.com TXT [Id: /hostedzone/HZ2]"
level=info msg="3 record(s) in zone internal.sandbox.yourdomain.com. [Id: /hostedzone/HZ2] were successfully updated"

Fixes #4241

Checklist

  • [x] Unit tests updated
  • [ ] End user documentation updated

leonardocaylent avatar Mar 03 '24 08:03 leonardocaylent

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: leonardocaylent (cea0496a958f6ffb83112acdc87d84f27a5fff75, 7c7a1c30ef8c2191092e18a23ebddb10cab8e8f0, e55200ae742bf6906641fb6664b2009a5e8d3a66, b8053fed819d53d84388b6d34646fec9eef22ccb, 846f4b40fab641ae4720f418cdc2756b1acc3b2f, deba1ea445bbb1008d12f650782419a1a2b9aa04, 2b3da1b6080ecd32cdde0cb007cc0839208383e3, 82046ccb15325b9ea8d1de9b2376e9c17d2edb37, 3395eba20ffc85e9fec1b0a061e9db25955a499a, 9307374110b97661f1b473d54766e463bdfacf23, 17ce6b4d05f6c66be7959c9bd1c8c90d1f53efe2, 6066b700b871399bdf66670ab6d9725c6037b884, 0308cc23e2f5ce84fa077931b3346be63eaa55ad, 30c9e77765df4300be4145858cc81707d1db1db3, a3826d541bc7071d3920715a9f5a89e92abe1cec, ba56b7abb385f1fe589784c4c14d5ced4d12a2c0, d9b743922ad7fbec94cb765bd384043742ff1fa3, 294b4c673ecc6aaff2bb48709d574cd678561c94, 519077750380a1a414fa0a46c583d20b46999d94, 64d083330364df2578bf8d8c107a56226534cad0, f4f39d8af66a31427edb48c09e930a8dce3083ac, 7fe2d3f0696e54f9a0d7e03c5465f2e1e7b8613f, 0ca279694a231dc0ac234c8d72e0f32c0a9286fb, 56024fd86881243e9ba227b07f6e8a68927a7695, d3c2f47fb10e5a30c55952ed3c688da154fd84a9, 05ca35e228040b841c38547c111f42ce1acc6d27, 4fb2f2e283a721cc41112604acea1a3998327398, 3ca4d0250f1c031c6dcc84133081fd4036fbdb65)

Welcome @leonardocaylent!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Mar 03 '24 08:03 k8s-ci-robot

Hi @leonardocaylent. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 03 '24 08:03 k8s-ci-robot

/ok-to-test

mloiseleur avatar Mar 04 '24 07:03 mloiseleur

/retitle fix: duplicated endpoint per hosted zone

mloiseleur avatar Mar 04 '24 07:03 mloiseleur

@leonardocaylent Do you think you can add a test on this ?

In order to keep issues fixed for good, we accept PR only when there is a test with it.

mloiseleur avatar Mar 04 '24 07:03 mloiseleur

@leonardocaylent Do you think you can add a test on this ?

In order to keep issues fixed for good, we accept PR only when there is a test with it.

@mloiseleur Yes, I can. I also need to check for TestAWSRecords and TestAWSApplyChanges that are failing.

leonardocaylent avatar Mar 04 '24 12:03 leonardocaylent

@yurrriq: changing LGTM is restricted to collaborators

In response to this:

We can optimize this a bit by using the empty struct.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 04 '24 17:03 k8s-ci-robot

Go modules update are done in dedicated PRs.

@leonardocaylent Would you please remove update of go.mod & go.sum in this PR ?

mloiseleur avatar Mar 05 '24 08:03 mloiseleur

@mloiseleur I need to test this image and review again the visitedChanges validation for aws.go and aws_test.go. But I'd love to have some review/thoughts on this. @yurrriq thanks for your comments, I didn't apply your suggestions yet because I'm focusing on the fix approach rather than optimizations. I'll optimize it when we confirm the path we want to follow to fix this. This seemed like an S but it's an L / XL since the impact that could have. Also the coverage of endpoint.go may need to be increased

leonardocaylent avatar Mar 05 '24 16:03 leonardocaylent

A change to endpoints.go is impacting every provider, but AFAIK we only have reports of this being a problem for a specific case for the aws provider. Is it impacting generally every provider?

Raffo avatar Mar 08 '24 19:03 Raffo

A change to endpoints.go is impacting every provider, but AFAIK we only have reports of this being a problem for a specific case for the aws provider. Is it impacting generally every provider?

We don't know the behavior of different providers. They should be having the same issue when applying FilterEndpointsByOwnerID

leonardocaylent avatar Mar 09 '24 16:03 leonardocaylent

@Raffo Apart from the comments I made, I found that if we fix this on endpoint.go using :

for _, ep := range eps {
		key := ep.Key()    <----------- I tested the function that provides unique endpoints based on RecordType Name and SetIdentifier
		// Skip duplicated endpoints
		if visited[key] {
			log.Debugf(`Skipping duplicated endpoint %v `, ep)
			continue
		}
		if endpointOwner, ok := ep.Labels[OwnerLabelKey]; !ok || endpointOwner != ownerID {
			log.Debugf(`Skipping endpoint %v because owner id does not match, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		} else {
			filtered = append(filtered, ep)
			log.Debugf(`Added endpoint %v because owner id matches, found: "%s", required: "%s"`, ep, endpointOwner, ownerID)
		}
		if(key != EndpointKey{}){
			visited[key] = true
		}
	}

One test in service_test.go fails: annotated services return an endpoint with hostname then resolve hostname The result of the test is expected 2 returned 0 If we fix it on aws.go we need to add more conditions to handle weighted and geolocation records (this was captured by make test while I was testing the fix) I think the next step is to think where is it worth to fix this.

leonardocaylent avatar Mar 09 '24 16:03 leonardocaylent

@Raffo I modified the validation: I just made changes on endpoint.go and added more visibility for debug logs on aws.go

About this lines:

		// Adding validation (skipping empty EndpointKeys)
		// Not sure why we have cases when there are empty keys, but we do
		if key != (EndpointKey{}) {
			visited[key] = true
		}

Without the if statement this error throws up about the new 2 unit tests on endpoint_test.go:

--- FAIL: TestDuplicatedFilterEndpointsByOwnerID (0.00s)
    --- FAIL: TestDuplicatedFilterEndpointsByOwnerID/filter_values (0.00s)
        endpoint_test.go:287: FilterEndpointsByOwnerID() = [foo.com 0 IN A   [] foo.com 0 IN CNAME   [] foo.com 0 IN A   [] foo.com 0 IN CNAME   []], want [foo.com 0 IN A   [] foo.com 0 IN CNAME   []]
--- FAIL: TestZonesDuplicatedFilterEndpointsByOwnerID (0.00s)
    --- FAIL: TestZonesDuplicatedFilterEndpointsByOwnerID/filter_values (0.00s)
        endpoint_test.go:402: FilterEndpointsByOwnerID() = [internal.foo.com 0 IN A   [] internal.foo.com 0 IN CNAME   [] internal.foo.com 0 IN A   [] internal.foo.com 0 IN CNAME   [] foo.com 0 IN A   [] foo.com 0 IN CNAME   [] foo.com 0 IN A   [] foo.com 0 IN CNAME   []], want [internal.foo.com 0 IN A   [] internal.foo.com 0 IN CNAME   [] foo.com 0 IN A   [] foo.com 0 IN CNAME   []]
time="2024-03-16T18:22:57Z" level=error msg="Invalid target net filter: "
time="2024-03-16T18:22:57Z" level=error msg="Invalid target net filter: "
time="2024-03-16T18:22:57Z" level=error msg="Invalid target net filter: 0"
time="2024-03-16T18:22:57Z" level=error msg="Invalid target net filter: "
time="2024-03-16T18:22:57Z" level=error msg="Invalid target net filter: "
FAIL
coverage: 61.0% of statements
FAIL    sigs.k8s.io/external-dns/endpoint       0.109s

Even the unit-tests are passing and the behavior of overlapping zones is corrected with this code, this probably needs some extra eyes to verify this is the way we want to go or improve the code (maybe using empty structs instead of boolean for memory as @yurrriq mentioned), I'm tagging @cronik @johngmyers @szuecs @mloiseleur.

In case we don't find a solution I'd suggest to rollback the changes made on https://github.com/kubernetes-sigs/external-dns/pull/3747 and fix this later to unblock all the users of 0.14.0 that uses overlapping zones on AWS.

leonardocaylent avatar Mar 16 '24 18:03 leonardocaylent

I suggest to remove all the changes in aws.go. They don't seem to be necessary. Otherwise, it looks very good to me. Thanks a lot !

mloiseleur avatar Mar 20 '24 14:03 mloiseleur

I suggest to remove all the changes in aws.go. They don't seem to be necessary. Otherwise, it looks very good to me. Thanks a lot !

That was just an addition to add more visibility on debug mode to see the record types that are being created. I'll remove it. Thanks for your feedback

leonardocaylent avatar Mar 21 '24 00:03 leonardocaylent

Based on this comment, this lines on master are a wrong unit test, we can't have CNAME and A records with the same name.

I analyzed if the Pull Request that introduced this issue can be easily reverted and is not so easy, some pull requests were added above the initial PR

So the 3 options that we have are: 1)Fix on endpoint.go & fix unit tests 2)Fix on aws.go and contemplate emptyrecords, weighted and geolocation & fix unit tests 3)Revert changes on 3747, remove the wrong unit tests and reopen the multi-cluster dual stack record types issue

leonardocaylent avatar Mar 26 '24 03:03 leonardocaylent

I would prefer to fix forward and not revert the PR. The order I see is:

  • Fix the wrong unit tests
  • fix the code based on the tests
  • implement whatever fix we agree on.

the first two points are the same PR, the third a different one (possibly this one).

Raffo avatar Mar 26 '24 12:03 Raffo

Based on this https://github.com/kubernetes-sigs/external-dns/pull/4296#discussion_r1532670169, this lines on master are a wrong unit test, we can't have CNAME and A records with the same name.

@leonardocaylent to be clear the referenced lines may not be technically correct according to DNS rules, but for the specific test TestFilterEndpointsByOwnerID it doesn't matter because the test is for the "OwnerID" filter which is not responsible detecting invalid/duplicate names. The logic to remove duplicates should probably not be added to that function.

cronik avatar Mar 26 '24 15:03 cronik

Based on this #4296 (comment), this lines on master are a wrong unit test, we can't have CNAME and A records with the same name.

@leonardocaylent to be clear the referenced lines may not be technically correct according to DNS rules, but for the specific test TestFilterEndpointsByOwnerID it doesn't matter because the test is for the "OwnerID" filter which is not responsible detecting invalid/duplicate names. The logic to remove duplicates should probably not be added to that function.

@cronik So I guess in that case changing the unit test to use only one type of records should be fine for this (same thing could be applied for avoid confusion on the L120-L174 in a different ticket) What do you think of the solution on endpoint.go? My concern is that the change impacts all providers and I'm not aware of use cases that needs duplicated endpoints

leonardocaylent avatar Mar 26 '24 16:03 leonardocaylent

Based on the feedback provided I was able to simplify the fix and set the unit tests according to our needs. Ready for re-review

leonardocaylent avatar Mar 29 '24 07:03 leonardocaylent

@leonardocaylent I don't think FilterEndpointsByOwnerID should be responsible for removing duplicates. It's only purpose is to filter by OwnerID as the function name indicates.

I suggest creating a new function that performs the singular task of removing duplicates. If that is what the aws provider requires then apply that filter there. That's my 2 cents, I'll defer to the project maintainers though.

cronik avatar Mar 29 '24 14:03 cronik

Can we instead fix it dedupSource.Endpoints()? https://github.com/kubernetes-sigs/external-dns/blob/v0.14.1/source/dedupsource.go#L38

yurrriq avatar Mar 29 '24 18:03 yurrriq

@cronik @yurrriq Thanks for your comments. Taking another deep look I found the duplicates were on controller.go inside Current & Desired. The new proposal is to add the function RemoveDuplicates on endpoint.go and use it on controller.go. I changed the unit tests to reflect the use cases on the new function

leonardocaylent avatar Mar 30 '24 02:03 leonardocaylent

@yurrriq: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 04 '24 18:04 k8s-ci-robot

@yurrriq Thanks for your comments, after analyzing the plan.go you were correct on putting the light on that function changes. Here is the update:

On version 0.13.6 and 0.14.0 always are duplicate endpoints on the variables 'records' at controller.go Line 206 The main difference is that changes.Delete on 0.13.6 was receiving duplicates, but the output was taking them out, apparently on this function This change seems like is not able to handle duplicates, so that's why a primary key on Endpoint is needed to correctly apply the filtering. In my opinion I think we need to add the string of targets as part of the primary key used on EndpointKey, but maybe it's not necessary.

leonardocaylent avatar Apr 05 '24 04:04 leonardocaylent

@leonardocaylent Creating a new map of Endpoint may work and comes also with some impact on users with big clusters. I've mixed feelings about this, because it's a fix for an edge case, not the common case.

A user should not have duplicated endpoint to start with, I tried to detail why here.

=> Wdyt about a fix that removes the root cause : the duplicated endpoint ?

mloiseleur avatar Apr 05 '24 07:04 mloiseleur

@mloiseleur I agree on that solution. About the aws.go provider: there is a part on func suitableZones() that has this comment // Only select the best matching public zone Another ticket could be created to create and develop the feature for selecting the best matching zone (it doesn't care if it's public or private) so we only have the records created in the place we really need them. When users migate to the new version, external-dns should detect that this records are not needed and delete them on the plan.

leonardocaylent avatar Apr 05 '24 16:04 leonardocaylent

@yurrriq The root cause of the issue is that this function lost the ability to remove the duplicate endpoints on the current records at 0.14.0 I tried adding the RemoveDuplicates function in here but does break some unit tests:

--- FAIL: TestPlan (0.11s)
    --- FAIL: TestPlan/TestSyncSecondRound (0.00s)
        plan_test.go:951: expected ["foo 0 IN CNAME  v2 []"] to match []
    --- FAIL: TestPlan/TestSyncSecondRoundMigration (0.00s)
        plan_test.go:951: expected [] to match ["foo 0 IN CNAME  v1 []"]

leonardocaylent avatar Apr 06 '24 06:04 leonardocaylent

Please follow up with https://github.com/kubernetes-sigs/external-dns/pull/4296#issuecomment-2020257118. Raffo said what he wants to see and if you want to get this done, then please follow his guideline. We are happy to review the mentioned PR that you can link back to this one to make sure there is more priority from our side.

szuecs avatar Apr 09 '24 11:04 szuecs