external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

Multiple headless services pointing to pods on the same node breaks external-dns

Open ItielOlenick opened this issue 1 year ago • 12 comments

What happened: Having multiple headless services that target pods that spawn on the same node breaks external dns (using with route53)

What you expected to happen: Working external-dns and records updated in route53.

How to reproduce it (as minimally and precisely as possible): deploy 2 sets of https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#kafka-stateful-set with different names but with the same external-dns.alpha.kubernetes.io/hostname value.

Anything else we need to know?: External-dns has a bug (feature?) that causes headless services to create root domain records in addition to the records that need to be created for the service.

Going off the example in https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#headless-service (case 2), These records are expected to be created:

kafka-0.example.org
kafka-1.example.org
kafka-2.example.org

But example.org is created as well.

When creating another service with the same root domain, an additional record for example.org will be updated with the node ips. When the pods are on the same node this will cause the call to route53 to look like:

{"action":"UPSERT","resourceRecordSet":{"name":"example.com","type":"A","tTL":300,"resourceRecords":[{"value":"10.5.25.120"},{"value":"10.5.25.253"},{"value":"10.5.31.11"},{"value":"10.5.31.11"}]}}

Which will fail because of the duplicate {"value":"10.5.31.11"} entry, resulting in

"errorCode":"InvalidChangeBatch","errorMessage":"[Duplicate Resource Record: '10.5.31.11']"

causing external-dns to go into an endless crash loop.

Environment:

  • External-DNS version: 0.13.5
  • DNS provider: AWS Route53

ItielOlenick avatar Feb 25 '24 07:02 ItielOlenick

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 25 '24 08:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 24 '24 08:06 k8s-triage-robot

/remove-lifecycle rotten

ItielOlenick avatar Jun 24 '24 08:06 ItielOlenick