external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

External DNS keeps upserting - v0.13.6

Open jgournet opened this issue 2 years ago • 28 comments

What happened: When upgrading from v0.13.5 to v0.13.6, we're getting every run:

kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Applying provider record filter for domains: [XXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Desired change: CREATE XXX A [Id: /hostedzone/XXXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Desired change: CREATE k8s.XXXX TXT [Id: /hostedzone/XXXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:40Z" level=info msg="2 record(s) in zone XXXXX [Id: /hostedzone/XXXX] were successfully updated"

What you expected to happen: same behavior as v0.13.5:

kube-system external-dns-9957dffb8-9mkcl external-dns time="2023-10-08T23:59:28Z" level=info msg="All records are already up to date"

How to reproduce it (as minimally and precisely as possible): We're using istio GW with annotation:

kind: Gateway
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: XXXXX,www.XXXXXX
    external-dns.alpha.kubernetes.io/target: dr-eks-ingress-gw-XXXXXX.amazonaws.com.

Anything else we need to know?: Works fine in v0.13.5

Environment:

  • External-DNS version (use external-dns --version): v0.13.6
  • DNS provider: AWS
  • Others:

** Similar issues: ** https://github.com/kubernetes-sigs/external-dns/issues/1421 https://github.com/kubernetes-sigs/external-dns/issues/1959

jgournet avatar Oct 09 '23 00:10 jgournet

This happends to me on the same version when going from 0.12.2 to 0.13.6. just on 2 records related to 1 app on my k8s cluster, dont know If it is because this record is the root decord for example the zone is example.com and the record is example.com and also external-dns.example.com. it constantly keeps updating it to the same value

fbarrerafalabella avatar Oct 25 '23 23:10 fbarrerafalabella

Weird addition: we now have a few clusters that run 0.13.6 without any issues at all ...

jgournet avatar Oct 25 '23 23:10 jgournet

Still reproducible for me with v0.14.0 – same two Route53 records in each zone are update repeatedly:

time="2023-11-08T20:47:30Z" level=info msg="Created Kubernetes client https://100.64.0.1:443"
time="2023-11-08T20:47:31Z" level=info msg="Applying provider record filter for domains: [yyy. .yyy. zzz. .zzz.]"
time="2023-11-08T20:47:31Z" level=info msg="Desired change: UPSERT yyy A [Id: /hostedzone/YYY]"
time="2023-11-08T20:47:31Z" level=info msg="Desired change: UPSERT external-dns.yyy TXT [Id: /hostedzone/YYY]"
time="2023-11-08T20:47:32Z" level=info msg="2 record(s) in zone yyy. [Id: /hostedzone/YYY] were successfully updated"
time="2023-11-08T20:47:33Z" level=info msg="Desired change: UPSERT zzz A [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:47:33Z" level=info msg="Desired change: UPSERT external-dns.zzz TXT [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:47:33Z" level=info msg="2 record(s) in zone zzz. [Id: /hostedzone/ZZZ] were successfully updated"

time="2023-11-08T20:48:31Z" level=info msg="Applying provider record filter for domains: [yyy. .yyy. zzz. .zzz.]"
time="2023-11-08T20:48:31Z" level=info msg="Desired change: UPSERT zzz A [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:48:31Z" level=info msg="Desired change: UPSERT external-dns.zzz TXT [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:48:31Z" level=info msg="2 record(s) in zone zzz. [Id: /hostedzone/ZZZ] were successfully updated"
time="2023-11-08T20:48:32Z" level=info msg="Desired change: UPSERT yyy A [Id: /hostedzone/YYY]"
time="2023-11-08T20:48:32Z" level=info msg="Desired change: UPSERT external-dns.yyy TXT [Id: /hostedzone/YYY]"
time="2023-11-08T20:48:33Z" level=info msg="2 record(s) in zone yyy. [Id: /hostedzone/YYY] were successfully updated"

gustav-b avatar Nov 08 '23 21:11 gustav-b

I get exactly the same thing, but only for the root record which concurs with @fbarrerafalabella. Other apps using sub-domains don't constantly UPSERT.

This happens with both 0.13.6 and 0.14.0

EDIT: Sanitised logs:

time="2023-11-13T19:29:21Z" level=debug msg="Refreshing zones list cache"
time="2023-11-13T19:29:22Z" level=debug msg="Considering zone: /hostedzone/ID (domain: foo.com.)"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service cert-manager-dns/cert-manager"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service external-dns/external-dns"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service ingress-nginx/ingress-nginx-controller"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service ingress-nginx/ingress-nginx-controller-admission"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service cert-manager-dns/cert-manager-webhook"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/apple-service"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/banana-service"
time="2023-11-13T19:29:22Z" level=debug msg="Endpoints generated from ingress: default/apple-ingress: [foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [] foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []]"
time="2023-11-13T19:29:22Z" level=debug msg="Endpoints generated from ingress: default/banana-ingress: [banana.foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [] foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []]"
time="2023-11-13T19:29:22Z" level=debug msg="Removing duplicate endpoint foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []"
time="2023-11-13T19:29:22Z" level=debug msg="Removing duplicate endpoint foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []"
time="2023-11-13T19:29:22Z" level=debug msg="Modifying endpoint: foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [], setting alias=true"
time="2023-11-13T19:29:22Z" level=debug msg="Modifying endpoint: banana.foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [], setting alias=true"
time="2023-11-13T19:29:22Z" level=debug msg="Refreshing zones list cache"
time="2023-11-13T19:29:23Z" level=debug msg="Considering zone: /hostedzone/ID (domain: foo.com.)"
time="2023-11-13T19:29:23Z" level=debug msg="Adding foo.com. to zone foo.com. [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=debug msg="Adding foo.com. to zone foo.com. [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=debug msg="Skipping record {\n  Action: \"UPSERT\",\n  ResourceRecordSet: {\n    Name: \"cname-foo.com\",\n    ResourceRecords: [{\n        Value: \"\\\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/apple-ingress\\\"\"\n      }],\n    TTL: 300,\n    Type: \"TXT\"\n  }\n} because no hosted zone matching record DNS Name was detected"
time="2023-11-13T19:29:23Z" level=info msg="Desired change: UPSERT foo.com A [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=info msg="Desired change: UPSERT foo.com TXT [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=info msg="2 record(s) in zone foo.com. [Id: /hostedzone/ID] were successfully updated"

EDIT2: I have a suspicion it's because it can't set the "root" cname record it tries to set as it doesn't control that domain:

time="2023-11-13T19:29:23Z" level=debug msg="Skipping record {\n  Action: \"UPSERT\",\n  ResourceRecordSet: {\n    Name: \"cname-foo.com\",\n    ResourceRecords: [{\n        Value: \"\\\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/apple-ingress\\\"\"\n      }],\n    TTL: 300,\n    Type: \"TXT\"\n  }\n} because no hosted zone matching record DNS Name was detected"

ElvenSpellmaker avatar Nov 13 '23 19:11 ElvenSpellmaker

Same here with 0.14.0 and digital ocean. level=warning msg="Updating existing target" on every single run , once per minute, i had some settings where it said "records already up to date" so I would need to debug further.

my current settings: - --source=service - --domain-filter=my.domain.com - --log-level=info - --provider=digitalocean - --policy=sync - --registry=txt - --txt-owner-id=k8s-owner

Jayd603 avatar Nov 24 '23 17:11 Jayd603

I had the same issue after upgrading to v0.13.6 with a single ingress that has a host in a root domain, i.e.

spec:
  rules:
    - host: myrootdomain.tld
    ...

    - host: '*.myrootdomain.tld'
    ...

Fixed by removing a rule for the root domain from ingress

yuriipolishchuk avatar Dec 15 '23 09:12 yuriipolishchuk

I had the same issue after upgrading to v0.13.6 with a single ingress that has a host in a root domain, i.e.

spec:
  rules:
    - host: myrootdomain.tld
    ...

    - host: '*.myrootdomain.tld'
    ...

Fixed by removing a rule for the root domain from ingress

In my case it still does it even with a single basic service entry. Updates every single time even when not necessary. I have a sub domain like cluster1.do.domain.com defined in a single place and that's it.

Jayd603 avatar Dec 20 '23 16:12 Jayd603

We see the same issue with 0.14.0 on Route53 with a A record of type Alias with the same name as the domain root.

time="2024-03-05T17:53:18Z" level=info msg="Applying provider record filter for domains: [the.domain.tld.]"
time="2024-03-05T17:53:19Z" level=info msg="Desired change: UPSERT _externaldns.the.domain.tld TXT [Id: /hostedzone/ZXXX]"
time="2024-03-05T17:53:19Z" level=info msg="Desired change: UPSERT the.domain.tld A [Id: /hostedzone/ZXXX]"
time="2024-03-05T17:53:19Z" level=info msg="2 record(s) in zone the.domain.tld. [Id: /hostedzone/ZXXX] were successfully updated"

I manually removed the TXT record and there's no attempt to update (since the ownership is removed). Not a solution, but a workaround to stop the upserts.

time="2024-03-05T17:54:20Z" level=info msg="Applying provider record filter for domains: [the.domain.tld.]"
time="2024-03-05T17:54:20Z" level=info msg="All records are already up to date"

However the other thing we notice is that the "new" TXT record is not being created, only the old one: we have record _externaldns.the.domin.tld but do NOT have _externaldns.cname-the.domain.tld which we would expect do to https://github.com/kubernetes-sigs/external-dns/blob/d2890b0a71c5c991c8c9e56f4108c17b8914cf64/registry/txt.go#L229-L232

hobti01 avatar Mar 05 '24 21:03 hobti01

is there a solution or an explanation for this? it keeps happening on newer releases

fbarrerafalabella avatar Mar 06 '24 23:03 fbarrerafalabella

Same error on my side too, only for a root record, subdomains records are working fine.

clesquere avatar Mar 07 '24 21:03 clesquere

@linki or @stevehipwell : would someone be able to check what is happening with this issue please ? We had to force version 0.13.5, as anything above keeps on upserting records (which generates alerts, as a matter of facts). thanks for your help

jgournet avatar Apr 03 '24 21:04 jgournet

@jgournet sorry but I'm not in a position to help with this, I'm the Helm chart maintainer but I'm only superficially familiar with the actual code here.

stevehipwell avatar Apr 04 '24 08:04 stevehipwell

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

sydorovdmytro avatar May 06 '24 08:05 sydorovdmytro

Same issue with version 0.14.1 in AWS with subdomains. Installed with helm chart external-dns-7.3.0, in AWS, with currently internet-facing NLBs; but I've seen it in a private hosted zone with internal NLBs too.

Strange thing was that it didn't do it for a while, and then just suddenly started continual UPSERTs for already existing records.

life5ign avatar May 24 '24 23:05 life5ign

@life5ign I don't think external-dns-7.3.0 is the official Helm chart.

stevehipwell avatar May 30 '24 10:05 stevehipwell

@stevehipwell you're right; I'm using https://artifacthub.io/packages/helm/bitnami/external-dns

I'll try the official https://artifacthub.io/packages/helm/external-dns/external-dns

life5ign avatar May 30 '24 15:05 life5ign

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

This work around solved the issue for me too. Now on v0.14.2 without any continuous upserts.

gustav-b avatar May 30 '24 18:05 gustav-b

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

@sydorovdmytro thanks, where did you get this idea? EDIT nevermind:

docker run -it --rm bitnami/external-dns:latest --help | grep txt-prefix

only place I could find the documentation on the available CLI flags

life5ign avatar May 31 '24 00:05 life5ign

These workarounds are just that. Workarounds. It's still an issue tbh.

ElvenSpellmaker avatar May 31 '24 00:05 ElvenSpellmaker

We had the same issue with v0.13.6 and AWS Route 53. Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

This work around solved the issue for me too. Now on v0.14.2 without any continuous upserts.

This worked for me once I sorted out some ingress provider ingressclassname issues in my cluster

I also decided to set --aws-prefer-cname to switch up the type of record away from A Alias proprietary type, at the same time.

life5ign avatar May 31 '24 16:05 life5ign

I am also struggling with this issue. I have tried the workarounds of setting --txt-prefix=%{record_type}-record-, and I have also tried setting the --txt-cache-interval=1h but then it just upserts after the interval. This is on v0.14.2

matthijswolters-rl avatar Aug 30 '24 09:08 matthijswolters-rl