external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

AWS Route53 and shared service IP address: InvalidChangeBatch: Duplicate Resource Record

Open stephan2012 opened this issue 4 years ago • 39 comments

What happened:

We are using MetalLB’s capability to share same external IP address with two or more Kubernetes services (through the metallb.universe.tf/allow-shared-ip annotation) and assign the same hostname through the external-dns.alpha.kubernetes.io/hostname annotation. Unfortunately, this results in errors with the AWS Route53 API:

time="2021-01-22T07:35:34Z" level=info msg="Desired change: CREATE loginputs.app.aws.company.com TXT [Id: /hostedzone/Z07039751Q2TM9MKHPKU5]"
time="2021-01-22T07:35:34Z" level=info msg="Desired change: CREATE mq.app.aws.company.com A [Id: /hostedzone/Z07039751Q2TM9MKHPKU5]"
time="2021-01-22T07:35:34Z" level=info msg="Desired change: CREATE mq.app.aws.company.com TXT [Id: /hostedzone/Z07039751Q2TM9MKHPKU5]"
time="2021-01-22T07:35:34Z" level=error msg="Failure in zone app.aws.company.com. [Id: /hostedzone/Z07039751Q2TM9MKHPKU5]"
time="2021-01-22T07:35:34Z" level=error msg="InvalidChangeBatch: [Duplicate Resource Record: '10.160.0.188']\n\tstatus code: 400, request id: 7205efc0-0a63-41e8-a9a4-66a739ac77b8"
time="2021-01-22T07:35:34Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/Z07039751Q2TM9MKHPKU5]"

What you expected to happen:

external-dns should deduplicate the list of changes that it submits in a single batch.

How to reproduce it (as minimally and precisely as possible):

Ensure a valid Route 53 configuration, assign the same loadBalancerIP, and identical external-dns.alpha.kubernetes.io/hostname annotations. When testing with MetalLB, the metallb.universe.tf/allow-shared-ip needs to be set on both services as shown in the docs.

Anything else we need to know?:

More details:

NAMESPACE              NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                                                                                                                       AGE
log                    graylog-tcp                          LoadBalancer   100.66.166.38    10.160.0.188   5044:32071/TCP,12201:30325/TCP,12203:30399/TCP                                                                                10h
log                    graylog-udp                          LoadBalancer   100.66.142.96    10.160.0.188   5410:32120/UDP                                                                                                                10h

Service graylog-udp:

apiVersion: v1
kind: Service
metadata:
  name: graylog-udp
  namespace: log
  annotations:
    external-dns.alpha.kubernetes.io/hostname: loginputs.app.aws.company.com
    metallb.universe.tf/allow-shared-ip: graylog-inputs
[…]
spec:
[…]
  type: LoadBalancer
  loadBalancerIP: 10.160.0.188

Service graylog-tcp:

apiVersion: v1
kind: Service
metadata:
  name: graylog-tcp
  namespace: log
  annotations:
    external-dns.alpha.kubernetes.io/hostname: loginputs.app.aws.company.com
    metallb.universe.tf/allow-shared-ip: graylog-inputs
[…]
spec:
[…]
  type: LoadBalancer
  loadBalancerIP: 10.160.0.188

This issue has been mentionied in #1015 before but that issue is closed and lacks details.

(By the way, just in case anybody wonders why there is MetalLB on an AWS intance: This is an easy way to manage multiple IP addresses on a single-node cluster.)

Environment:

  • External-DNS version (use external-dns --version): v0.7.5
  • DNS provider: AWS Route53
  • Others:

stephan2012 avatar Jan 22 '21 11:01 stephan2012

We are also seeing the same issue with v0.7.6, with multiple services running on the same EKS cluster. The issue does not occur with v0.7.4

FlowColwyn avatar Apr 15 '21 11:04 FlowColwyn

We are experiencing the issue today. I can confirm switching to v0.7.4 seems to have resolved the issue. We are not using the MetalLB though, just the standard external-dns pod. Bug showed in latest tag prompting debug.

robwithhair avatar Apr 29 '21 08:04 robwithhair

We encountered this issue today using v0.7.6. Tested this with v0.8.0 and it still has the issue. Rolling back to v0.7.4 fixed it.

miguelgmalpha avatar May 14 '21 10:05 miguelgmalpha

Records not being deduplicated seems to be visible also with the rfc2136 provider. external-dns constantly removes and adds a resource record because the same hostname is assigned to two different services sharing the same IP address.

Logs:

time="2021-05-14T10:19:03Z" level=info msg="Removing RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:19:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:19:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:19:03Z" level=info msg="Removing RR: loginput.lab.company.com 0 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:19:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:20:03Z" level=info msg="Removing RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:20:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:20:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:20:03Z" level=info msg="Removing RR: loginput.lab.company.com 0 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:20:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:21:03Z" level=info msg="Removing RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:21:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:21:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:21:03Z" level=info msg="Removing RR: loginput.lab.company.com 0 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:21:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:22:03Z" level=info msg="Removing RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:22:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:22:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:22:03Z" level=info msg="Removing RR: loginput.lab.company.com 0 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:22:03Z" level=info msg="Adding RR: loginput.lab.company.com 60 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:23:04Z" level=info msg="Removing RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:23:04Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:23:04Z" level=info msg="Adding RR: loginput.lab.company.com 60 A 192.168.100.118"
time="2021-05-14T10:23:04Z" level=info msg="Removing RR: loginput.lab.company.com 0 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""
time="2021-05-14T10:23:04Z" level=info msg="Adding RR: loginput.lab.company.com 60 TXT \"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/log/graylog-tcp\""

Service 1:

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: loginput.lab.company.com
    meta.helm.sh/release-name: graylog
    meta.helm.sh/release-namespace: log
    metallb.universe.tf/allow-shared-ip: graylog-inputs
[…]

Service 2:

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: loginput.lab.company.com
    meta.helm.sh/release-name: graylog
    meta.helm.sh/release-namespace: log
    metallb.universe.tf/allow-shared-ip: graylog-inputs

stephan2012 avatar May 14 '21 10:05 stephan2012

We have a similar scenario where two diferent DNS entries point to the same IPs

time="2021-05-18T12:55:01Z" level=info msg="Desired change: UPSERT external-brokers.tenant.eu.stg.data.company.com A [Id: /hostedzone/XXXXXXXXXXXXXXXXX]"
time="2021-05-18T12:55:01Z" level=info msg="Desired change: UPSERT external-brokers.tenant.eu.stg.data.company.com TXT [Id: /hostedzone/XXXXXXXXXXXXXXXXX]"
time="2021-05-18T12:55:01Z" level=info msg="Desired change: UPSERT external-brokers.stg.data.company.com A [Id: /hostedzone/XXXXXXXXXXXXXXXXX]"
time="2021-05-18T12:55:01Z" level=info msg="Desired change: UPSERT external-brokers.stg.data.company.com TXT [Id: /hostedzone/XXXXXXXXXXXXXXXXX]"
time="2021-05-18T12:55:01Z" level=error msg="Failure in zone stg.data.company.com. [Id: /hostedzone/XXXXXXXXXXXXXXXXX]"
time="2021-05-18T12:55:01Z" level=error msg="InvalidChangeBatch: [Duplicate Resource Record: '10.221.8.236', Duplicate Resource Record: '10.221.8.236', Duplicate Resource Record: '10.221.3.179', Duplicate Resource Record: '10.221.7.87']\n\tstatus code: 400, request id: 925592cf-4805-4f16-b288-9269848b8649"

Service 1:

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: external-brokers.stg.data.company.com
spec:
  type: NodePort
...

Service 2:

apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: external-brokers.tenant.eu.stg.data.company.com
spec:
  type: NodePort
...

With external-dns:0.7.4 this works fine.

miguelgmalpha avatar May 18 '21 13:05 miguelgmalpha

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 16 '21 13:08 k8s-triage-robot

/remove-lifecycle stale

stephan2012 avatar Aug 16 '21 13:08 stephan2012

Issue is not yet resolved.

stephan2012 avatar Aug 16 '21 13:08 stephan2012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 14 '21 14:11 k8s-triage-robot

/remove-lifecycle stale

stephan2012 avatar Nov 14 '21 16:11 stephan2012

The issue does not disappear just because it is not addressed …

stephan2012 avatar Nov 14 '21 16:11 stephan2012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 12 '22 17:02 k8s-triage-robot

Still an issue.

/remove-lifecycle stale

stephan2012 avatar Feb 14 '22 17:02 stephan2012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 15 '22 17:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 14 '22 18:06 k8s-triage-robot

/remove-lifecycle rotten

flokli avatar Jun 23 '22 09:06 flokli

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 21 '22 10:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 21 '22 10:10 k8s-triage-robot

/remove-lifecycle rotten

flokli avatar Oct 21 '22 11:10 flokli

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 19 '23 12:01 k8s-triage-robot

/remove-lifecycle stale

flokli avatar Jan 20 '23 10:01 flokli

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 20 '23 10:04 k8s-triage-robot

/remove-lifecycle stale

flokli avatar Apr 20 '23 12:04 flokli

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 19 '23 13:07 k8s-triage-robot

/remove-lifecycle stale

flokli avatar Jul 19 '23 13:07 flokli

This issue is still occurring occasionally for us. The workaround for us is to delete the duplicated service.

kubectl get service -n your-namespace | grep duplicated-external-name-service-here

kubectl delete service -n your-namespace duplicated-external-name-service-here

sam-som avatar Aug 28 '23 23:08 sam-som

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 27 '24 05:01 k8s-triage-robot

/remove-lifecycle stale

flokli avatar Jan 27 '24 08:01 flokli

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 26 '24 09:04 k8s-triage-robot

/remove-lifecycle stale

❤️

flokli avatar Apr 26 '24 10:04 flokli