external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

WIP Always create AAAA alias records in route53

Open johngmyers opened this issue 2 years ago • 37 comments

Description

Always creates both A and AAAA alias records in Route53.

When a Route53 AAAA alias record points to an IPv4-only resource, AAAA DNS queries return zero records. This is the same behavior as when there is no AAAA alias record but is an A alias record. For this reason, there is no rational reason to not also create the AAAA alias record.

Adds logic for reconciling discrepancies between the A and AAAA alias records, including when only one of the two is present. In some situations, this reconciliation will take two iterations of the reconciliation algorithm to converge, as the Route53 provider does not store all information about the AAAA record with the discrepancy in the Endpoint. Encoding this information in the provider-specific annotations would result in a lot of complicated code for a rare edge case. An alternative would be to add an opaque provider-specific interface{} to Endpoint so that the Route53 provider could store the ResourceRecordSet directly.

This PR is an alternative to #2050.

Fixes #3707

Checklist

  • [x] Unit tests updated
  • [x] End user documentation updated

johngmyers avatar May 14 '23 02:05 johngmyers

@johngmyers this will 1) change default to use dualstack and therefore 2) roughly double requests to AWS route53 with its "nice" API rate limits (see also https://github.com/kubernetes-sigs/external-dns/pull/3402). I would rather like to opt-in this feature at least via a new flag.

szuecs avatar May 15 '23 13:05 szuecs

One should not have to opt-in to expected, correct behavior. #3402 should address rate limiting, especially if it defaults to enabled.

At worst, this should be opt-out, though that will further increase complexity and the number of needed test cases.

johngmyers avatar May 16 '23 00:05 johngmyers

Also, why does a mere factor of 1.5 deserve such complexity?

johngmyers avatar May 16 '23 01:05 johngmyers

@johngmyers in general I agree with you! I just wanted to raise awareness. Maybe you can also review the cache PR, I see that you did, but if you think it's ready please add your /lgtm, too.

szuecs avatar May 16 '23 12:05 szuecs

/ok-to-test

szuecs avatar May 16 '23 12:05 szuecs

I'm going to experiment with implementing this using AdjustEndpoints. I'm concerned with the complexity of reconciling discrepancies between A and AAAA in the provider.

johngmyers avatar May 16 '23 14:05 johngmyers

The AdjustEndpoints approach is much simpler, though with the disadvantage that the TXT registry creates ownership records for the additional AAAA alias records.

I don't think registries should be creating ownership records for endpoints that were added by AdjustEndpoints, though that's arguably an issue for a separate PR. I haven't yet figured out how to avoid creating such records without giving the txt registry knowledge of route53.

I'm also thinking of writing a dynamodb registry.

johngmyers avatar May 16 '23 18:05 johngmyers

To avoid creating ownership records for the AAAA alias records, we could:

  1. Define a label and have Provider.AdjustEndpoints() add said label to each additional Endpoint it creates.
  2. The presence of this label causes TXTRegistry.ApplyChanges() to not create any ownership records for such Endpoints.
  3. Define a new interface method on Provider or a new interface implemented by AWSProvider. Registry.Records() implementations call this method after adding all their Labels to the endpoints. The AWSProvider implementation of this method copies the Labels from the CNAME alias Endpoints to the corresponding AAAA alias Endpoints.

This wouldn't handle the case where the A record is deleted but the AAAA and TXT were not. To handle that case, the new interface method would have to take and endpointName, type, and possibly labels and return a set of additional endpointNames and types that the labels would apply to.

johngmyers avatar May 16 '23 23:05 johngmyers

/ok-to-test

szuecs avatar May 17 '23 13:05 szuecs

@szuecs what is the next step to get this merged? I'm working on a DynamoDB registry in #3648, but I doubt that should be a prerequisite.

Would scheduling a meeting be helpful?

johngmyers avatar Jun 03 '23 17:06 johngmyers

Poke @szuecs, the AWS community really needs this to be merged. I'm manually hand-bombing ipv6 records because this is missing. Please help. https://github.com/kubernetes-sigs/external-dns/pull/3605#issuecomment-1575077589

cdobbyn avatar Jun 15 '23 18:06 cdobbyn

@johngmyers I agree with you that default behavior should work out of the box and I also agree with @szuecs that this behavior may cause unexpected issues. => I suggest to add an opt-out flag, something like --aws-ipv4-only or --aws-only-A-records

mloiseleur avatar Jun 28 '23 07:06 mloiseleur

@cdobbyn I wonder why I don't see this issue in my large fleet of clusters (running kube-ingress-aws-controller + skipper). What setup creates the problem?

@johngmyers opt-out flag seems to be fine and then let' merge it.

szuecs avatar Jun 28 '23 10:06 szuecs

@mloiseleur @szuecs don't we already have an opt-out flag, specifically --managed-record-types=A,CNAME?

johngmyers avatar Jun 28 '23 14:06 johngmyers

@johngmyers then I would like that you check if the documentation is good enough and if so merge /approve

szuecs avatar Jun 28 '23 16:06 szuecs

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, szuecs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jun 28 '23 16:06 k8s-ci-robot

@johngmyers I've deployed 4dcce15b7f to a cluster on AWS and I'm seeing the following behavior:

When I add a new hostname to a service's external-dns.alpha.kubernetes.io/hostname annotation, it creates the following records:

_extdns.aaaa-hostname TXT _extdns.cname-hostname TXT _extdns.hostname TXT hostname A hostname AAAA

When I subsequently remove the same hostname from the service's annotation, it only deletes:

_extdns.cname-hostname TXT _extdns.hostname TXT hostname A

When I added the hostname back to the service, it created those same three records and left the AAAA-related records alone.

As I was testing the behavior of moving the hostname back and forth between two different services, I got the following panic during the reconciliation loop:

time="2023-07-06T22:28:27Z" level=info msg="Applying provider record filter for domains: [...]"
panic: runtime error: index out of range [3] with length 3

goroutine 1 [running]:
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).createUpdateChanges(0xc000df89c0?, {0xc000e964c0, 0x5, 0xc00007a400?}, {0xc00075edb0, 0x3, 0xc001574ca0?})
	/src/3p/external-dns/provider/aws/aws.go:469 +0x5c5
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).ApplyChanges(0xc000e93200?, {0x3c54410, 0xc000d06390}, 0xc000e8fd40)
	/src/3p/external-dns/provider/aws/aws.go:508 +0x86
sigs.k8s.io/external-dns/registry.(*TXTRegistry).ApplyChanges(0xc000609c30, {0x3c54410, 0xc000d06390}, 0xc000e8fce0)
	/src/3p/external-dns/registry/txt.go:284 +0xae5
sigs.k8s.io/external-dns/controller.(*Controller).RunOnce(0xc000de58c0, {0x3c54368, 0xc00013c230})
	/src/3p/external-dns/controller/controller.go:231 +0x61d
sigs.k8s.io/external-dns/controller.(*Controller).Run(0xc000de58c0?, {0x3c54368, 0xc00013c230})
	/src/3p/external-dns/controller/controller.go:313 +0xc5
main.main()
	/src/3p/external-dns/main.go:452 +0x4765

As far as I could tell, this may have been persisting until I deleted the 5 records associated with that external name in the zone. Not every move of the name seemed to cause this state. I feel like I saw other moves where the external name simply wasn't updated, and other moves where only the A record was updated.

The deployment itself has the following settings:

        - name: EXTERNAL_DNS_PROVIDER
          value: aws
        - name: EXTERNAL_DNS_DOMAIN_FILTER
          value: |-
            domain.com
            anotherdomain.com
            thirddomain.com
        - name: EXTERNAL_DNS_AWS_ZONE_TYPE
          value: public
        - name: EXTERNAL_DNS_TXT_OWNER_ID
          value: eks-cluster-1-external-dns
        - name: EXTERNAL_DNS_TXT_PREFIX
          value: _extdns.
        - name: EXTERNAL_DNS_REGISTRY
          value: txt
        - name: EXTERNAL_DNS_SOURCE
          value: |-
            service
            ingress

as well as a number of --zone-id-filters supplied as CLI arguments.

On the other hand, the initial deployment of this, when there were no existing matching records in the zone it was targeting and it created new records for a newly created Service went really well and seemed to work great.

reltuk avatar Jul 06 '23 22:07 reltuk

I suspect this is the issue that #3724 is attempting to fix.

johngmyers avatar Jul 07 '23 02:07 johngmyers

/hold for #3724

johngmyers avatar Jul 11 '23 18:07 johngmyers

/hold cancel

johngmyers avatar Aug 10 '23 22:08 johngmyers

/assign @mloiseleur

johngmyers avatar Aug 10 '23 22:08 johngmyers

/test pull-external-dns-lint /test pull-external-dns-unit-test

johngmyers avatar Aug 10 '23 22:08 johngmyers

/retest

johngmyers avatar Aug 10 '23 23:08 johngmyers

I think this is ready to go in.

johngmyers avatar Aug 11 '23 21:08 johngmyers

I have tested it with a build of this PR that I published here. I can confirm that:

  1. It should be documented in doc/tutorials/aws.md, since it works following this tutorial on Service with annotations
  2. When using --managed-record-types="A,CNAME", it did NOT update records at all (Same behavior between this PR and v0.13.5). So it does not seem to be a working opt-out.
  3. When changing the hostname, for instance from test2.example.com to test3.example.com, it did not delete the AAAA record:
time="2023-08-14T15:01:37Z" level=info msg="Applying provider record filter for domains: [example.com. .example.com.]"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.cname-test2.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.test2.example.com TXT "
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE test2.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.aaaa-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.cname-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com AAAA"

mloiseleur avatar Aug 14 '23 15:08 mloiseleur

Blocked by #3910

johngmyers avatar Sep 15 '23 05:09 johngmyers

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 15 '23 18:09 k8s-ci-robot

Any news on this PR?

irony avatar Oct 29 '23 12:10 irony

Any status updates?

paalkr avatar Nov 13 '23 16:11 paalkr

I'm sorry to bump this one more time. It's a super important feature to us. Here in Norway, all public services must, by law, be announced with IPv6 support.

Anything we can contribute with to get this feature merged faster?

paalkr avatar Dec 07 '23 08:12 paalkr