external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

Fails to gracefullly swap CNAME for A records.

Open kevincox opened this issue 3 years ago • 1 comments
trafficstars

What happened: I was trying to replace a CNAME records with a set of A records controlled by external-dns. I added the required ownership and ran external-dns. However at the end I was left with no A record and no CNAME record, a broken state!

What you expected to happen: The CNAME record was deleted.

How to reproduce it (as minimally and precisely as possible):

  1. Set up some A records (like for an Ingress or for nodes) inside external-dns.
  2. Replace the records with a CNAME (to anything) outside of external-dns. Leave the ownership records in place.
  3. Run external-dns again.
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=error msg="failed to create record: HTTP status 400: A CNAME record with that host already exists. (81054)" action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=error msg="failed to create record: HTTP status 400: A CNAME record with that host already exists. (81054)" action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=DELETE record=helo.feedmail.org ttl=1 type=CNAME zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=UPDATE record=external-dns-aeb5uf6o.helo.feedmail.org ttl=1 type=TXT zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:55Z" level=info msg="Changing record." action=UPDATE record=external-dns-aeb5uf6o.a-helo.feedmail.org ttl=1 type=TXT zone=d3ae40d8d5a9540c3d93561fede8f86b

Anything else we need to know?:

This appears to occur because it attempts to create the A record before removing the CNAME. Then it continues to remove the CNAME. In general create-then-delete is good but in this case it needs to delete first.

Mitigation: Run external-dns again. Now that the CNAME is gone it should successfully create the A records.

Steps to fix:

  1. If creating a record in the batch fails, don't continue with deletions in that batch. This is just generally risky because it indicates a situation that was unexpected and ti is safer to abort that batch rather than continue into the unknown. This would have mitigated this bug.
  2. If a batch has an existing CNAME record order that deletion just before the creation of records with the same name. This will cause slight downtime but is the best that is possible without transactional updates. (Do any providers support this?)

Environment:

  • External-DNS version (use external-dns --version): k8s.gcr.io/external-dns/external-dns:v0.12.0
  • DNS provider: Cloudflare.

kevincox avatar Jun 17 '22 12:06 kevincox

I have same issue. In my case, I can safely remove a- prefixed TXT record manually.

isac322 avatar Jul 29 '22 09:07 isac322

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 27 '22 09:10 k8s-triage-robot

Fuck off bot. Problems don't go away just because you close the ticket.

/remove-lifecycle stale

kevincox avatar Oct 27 '22 10:10 kevincox

@kevincox I believe this can now be closed as reordering the operations has been merged in https://github.com/kubernetes-sigs/external-dns/pull/3094#event-7783242311

Evesy avatar Nov 15 '22 10:11 Evesy