external-dns Multiple headless services pointing to pods on the same node breaks external-dns

What happened: Having multiple headless services that target pods that spawn on the same node breaks external dns (using with route53)

What you expected to happen: Working external-dns and records updated in route53.

How to reproduce it (as minimally and precisely as possible): deploy 2 sets of https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#kafka-stateful-set with different names but with the same external-dns.alpha.kubernetes.io/hostname value.

Anything else we need to know?: External-dns has a bug (feature?) that causes headless services to create root domain records in addition to the records that need to be created for the service.

Going off the example in https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#headless-service (case 2), These records are expected to be created:

kafka-0.example.org
kafka-1.example.org
kafka-2.example.org

But example.org is created as well.

When creating another service with the same root domain, an additional record for example.org will be updated with the node ips. When the pods are on the same node this will cause the call to route53 to look like:

{"action":"UPSERT","resourceRecordSet":{"name":"example.com","type":"A","tTL":300,"resourceRecords":[{"value":"10.5.25.120"},{"value":"10.5.25.253"},{"value":"10.5.31.11"},{"value":"10.5.31.11"}]}}

Which will fail because of the duplicate {"value":"10.5.31.11"} entry, resulting in

"errorCode":"InvalidChangeBatch","errorMessage":"[Duplicate Resource Record: '10.5.31.11']"

causing external-dns to go into an endless crash loop.

Environment:

External-DNS version: 0.13.5
DNS provider: AWS Route53

Feb 25 '24 07:02 ItielOlenick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 25 '24 08:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 24 '24 08:06 k8s-triage-robot

/remove-lifecycle rotten

Jun 24 '24 08:06 ItielOlenick

external-dns external-dns copied to clipboard

Multiple headless services pointing to pods on the same node breaks external-dns

external-dns
external-dns copied to clipboard