external-dns
external-dns copied to clipboard
Multiple headless services pointing to pods on the same node breaks external-dns
What happened: Having multiple headless services that target pods that spawn on the same node breaks external dns (using with route53)
What you expected to happen: Working external-dns and records updated in route53.
How to reproduce it (as minimally and precisely as possible):
deploy 2 sets of https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#kafka-stateful-set with different names but with the same external-dns.alpha.kubernetes.io/hostname value.
Anything else we need to know?: External-dns has a bug (feature?) that causes headless services to create root domain records in addition to the records that need to be created for the service.
Going off the example in https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/hostport.md#headless-service (case 2), These records are expected to be created:
kafka-0.example.org
kafka-1.example.org
kafka-2.example.org
But example.org is created as well.
When creating another service with the same root domain, an additional record for example.org will be updated with the node ips.
When the pods are on the same node this will cause the call to route53 to look like:
{"action":"UPSERT","resourceRecordSet":{"name":"example.com","type":"A","tTL":300,"resourceRecords":[{"value":"10.5.25.120"},{"value":"10.5.25.253"},{"value":"10.5.31.11"},{"value":"10.5.31.11"}]}}
Which will fail because of the duplicate {"value":"10.5.31.11"} entry, resulting in
"errorCode":"InvalidChangeBatch","errorMessage":"[Duplicate Resource Record: '10.5.31.11']"
causing external-dns to go into an endless crash loop.
Environment:
- External-DNS version: 0.13.5
- DNS provider: AWS Route53
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten