external-dns
external-dns copied to clipboard
WIP Always create AAAA alias records in route53
Description
Always creates both A and AAAA alias records in Route53.
When a Route53 AAAA alias record points to an IPv4-only resource, AAAA DNS queries return zero records. This is the same behavior as when there is no AAAA alias record but is an A alias record. For this reason, there is no rational reason to not also create the AAAA alias record.
Adds logic for reconciling discrepancies between the A and AAAA alias records, including when only one of the two is present. In some situations, this reconciliation will take two iterations of the reconciliation algorithm to converge, as the Route53 provider does not store all information about the AAAA record with the discrepancy in the Endpoint. Encoding this information in the provider-specific annotations would result in a lot of complicated code for a rare edge case. An alternative would be to add an opaque provider-specific interface{} to Endpoint so that the Route53 provider could store the ResourceRecordSet directly.
This PR is an alternative to #2050.
Fixes #3707
Checklist
- [x] Unit tests updated
- [x] End user documentation updated
@johngmyers this will 1) change default to use dualstack and therefore 2) roughly double requests to AWS route53 with its "nice" API rate limits (see also https://github.com/kubernetes-sigs/external-dns/pull/3402). I would rather like to opt-in this feature at least via a new flag.
One should not have to opt-in to expected, correct behavior. #3402 should address rate limiting, especially if it defaults to enabled.
At worst, this should be opt-out, though that will further increase complexity and the number of needed test cases.
Also, why does a mere factor of 1.5 deserve such complexity?
@johngmyers in general I agree with you!
I just wanted to raise awareness. Maybe you can also review the cache PR, I see that you did, but if you think it's ready please add your /lgtm, too.
/ok-to-test
I'm going to experiment with implementing this using AdjustEndpoints. I'm concerned with the complexity of reconciling discrepancies between A and AAAA in the provider.
The AdjustEndpoints approach is much simpler, though with the disadvantage that the TXT registry creates ownership records for the additional AAAA alias records.
I don't think registries should be creating ownership records for endpoints that were added by AdjustEndpoints, though that's arguably an issue for a separate PR. I haven't yet figured out how to avoid creating such records without giving the txt registry knowledge of route53.
I'm also thinking of writing a dynamodb registry.
To avoid creating ownership records for the AAAA alias records, we could:
- Define a label and have
Provider.AdjustEndpoints()add said label to each additionalEndpointit creates. - The presence of this label causes
TXTRegistry.ApplyChanges()to not create any ownership records for suchEndpoints. - Define a new interface method on
Provideror a new interface implemented byAWSProvider.Registry.Records()implementations call this method after adding all theirLabelsto the endpoints. TheAWSProviderimplementation of this method copies theLabelsfrom the CNAME aliasEndpoints to the corresponding AAAA aliasEndpoints.
This wouldn't handle the case where the A record is deleted but the AAAA and TXT were not. To handle that case, the new interface method would have to take and endpointName, type, and possibly labels and return a set of additional endpointNames and types that the labels would apply to.
/ok-to-test
@szuecs what is the next step to get this merged? I'm working on a DynamoDB registry in #3648, but I doubt that should be a prerequisite.
Would scheduling a meeting be helpful?
Poke @szuecs, the AWS community really needs this to be merged. I'm manually hand-bombing ipv6 records because this is missing. Please help. https://github.com/kubernetes-sigs/external-dns/pull/3605#issuecomment-1575077589
@johngmyers I agree with you that default behavior should work out of the box and I also agree with @szuecs that this behavior may cause unexpected issues.
=> I suggest to add an opt-out flag, something like --aws-ipv4-only or --aws-only-A-records
@cdobbyn I wonder why I don't see this issue in my large fleet of clusters (running kube-ingress-aws-controller + skipper). What setup creates the problem?
@johngmyers opt-out flag seems to be fine and then let' merge it.
@mloiseleur @szuecs don't we already have an opt-out flag, specifically --managed-record-types=A,CNAME?
@johngmyers then I would like that you check if the documentation is good enough and if so merge /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: johngmyers, szuecs
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [szuecs]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@johngmyers I've deployed 4dcce15b7f to a cluster on AWS and I'm seeing the following behavior:
When I add a new hostname to a service's external-dns.alpha.kubernetes.io/hostname annotation, it creates the following records:
_extdns.aaaa-hostname TXT
_extdns.cname-hostname TXT
_extdns.hostname TXT
hostname A
hostname AAAA
When I subsequently remove the same hostname from the service's annotation, it only deletes:
_extdns.cname-hostname TXT
_extdns.hostname TXT
hostname A
When I added the hostname back to the service, it created those same three records and left the AAAA-related records alone.
As I was testing the behavior of moving the hostname back and forth between two different services, I got the following panic during the reconciliation loop:
time="2023-07-06T22:28:27Z" level=info msg="Applying provider record filter for domains: [...]"
panic: runtime error: index out of range [3] with length 3
goroutine 1 [running]:
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).createUpdateChanges(0xc000df89c0?, {0xc000e964c0, 0x5, 0xc00007a400?}, {0xc00075edb0, 0x3, 0xc001574ca0?})
/src/3p/external-dns/provider/aws/aws.go:469 +0x5c5
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).ApplyChanges(0xc000e93200?, {0x3c54410, 0xc000d06390}, 0xc000e8fd40)
/src/3p/external-dns/provider/aws/aws.go:508 +0x86
sigs.k8s.io/external-dns/registry.(*TXTRegistry).ApplyChanges(0xc000609c30, {0x3c54410, 0xc000d06390}, 0xc000e8fce0)
/src/3p/external-dns/registry/txt.go:284 +0xae5
sigs.k8s.io/external-dns/controller.(*Controller).RunOnce(0xc000de58c0, {0x3c54368, 0xc00013c230})
/src/3p/external-dns/controller/controller.go:231 +0x61d
sigs.k8s.io/external-dns/controller.(*Controller).Run(0xc000de58c0?, {0x3c54368, 0xc00013c230})
/src/3p/external-dns/controller/controller.go:313 +0xc5
main.main()
/src/3p/external-dns/main.go:452 +0x4765
As far as I could tell, this may have been persisting until I deleted the 5 records associated with that external name in the zone. Not every move of the name seemed to cause this state. I feel like I saw other moves where the external name simply wasn't updated, and other moves where only the A record was updated.
The deployment itself has the following settings:
- name: EXTERNAL_DNS_PROVIDER
value: aws
- name: EXTERNAL_DNS_DOMAIN_FILTER
value: |-
domain.com
anotherdomain.com
thirddomain.com
- name: EXTERNAL_DNS_AWS_ZONE_TYPE
value: public
- name: EXTERNAL_DNS_TXT_OWNER_ID
value: eks-cluster-1-external-dns
- name: EXTERNAL_DNS_TXT_PREFIX
value: _extdns.
- name: EXTERNAL_DNS_REGISTRY
value: txt
- name: EXTERNAL_DNS_SOURCE
value: |-
service
ingress
as well as a number of --zone-id-filters supplied as CLI arguments.
On the other hand, the initial deployment of this, when there were no existing matching records in the zone it was targeting and it created new records for a newly created Service went really well and seemed to work great.
I suspect this is the issue that #3724 is attempting to fix.
/hold for #3724
/hold cancel
/assign @mloiseleur
/test pull-external-dns-lint /test pull-external-dns-unit-test
/retest
I think this is ready to go in.
I have tested it with a build of this PR that I published here. I can confirm that:
- It should be documented in
doc/tutorials/aws.md, since it works following this tutorial onServicewith annotations - When using
--managed-record-types="A,CNAME", it did NOT update records at all (Same behavior between this PR and v0.13.5). So it does not seem to be a working opt-out. - When changing the hostname, for instance from
test2.example.comtotest3.example.com, it did not delete theAAAArecord:
time="2023-08-14T15:01:37Z" level=info msg="Applying provider record filter for domains: [example.com. .example.com.]"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.cname-test2.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.test2.example.com TXT "
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE test2.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.aaaa-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.cname-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com AAAA"
Blocked by #3910
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Any news on this PR?
Any status updates?
I'm sorry to bump this one more time. It's a super important feature to us. Here in Norway, all public services must, by law, be announced with IPv6 support.
Anything we can contribute with to get this feature merged faster?