external-dns
external-dns copied to clipboard
Fails to gracefullly swap CNAME for A records.
What happened: I was trying to replace a CNAME records with a set of A records controlled by external-dns. I added the required ownership and ran external-dns. However at the end I was left with no A record and no CNAME record, a broken state!
What you expected to happen: The CNAME record was deleted.
How to reproduce it (as minimally and precisely as possible):
- Set up some A records (like for an Ingress or for nodes) inside external-dns.
- Replace the records with a CNAME (to anything) outside of external-dns. Leave the ownership records in place.
- Run external-dns again.
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=error msg="failed to create record: HTTP status 400: A CNAME record with that host already exists. (81054)" action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=error msg="failed to create record: HTTP status 400: A CNAME record with that host already exists. (81054)" action=CREATE record=helo.feedmail.org ttl=1 type=A zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=DELETE record=helo.feedmail.org ttl=1 type=CNAME zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:54Z" level=info msg="Changing record." action=UPDATE record=external-dns-aeb5uf6o.helo.feedmail.org ttl=1 type=TXT zone=d3ae40d8d5a9540c3d93561fede8f86b
time="2022-06-17T12:39:55Z" level=info msg="Changing record." action=UPDATE record=external-dns-aeb5uf6o.a-helo.feedmail.org ttl=1 type=TXT zone=d3ae40d8d5a9540c3d93561fede8f86b
Anything else we need to know?:
This appears to occur because it attempts to create the A record before removing the CNAME. Then it continues to remove the CNAME. In general create-then-delete is good but in this case it needs to delete first.
Mitigation: Run external-dns again. Now that the CNAME is gone it should successfully create the A records.
Steps to fix:
- If creating a record in the batch fails, don't continue with deletions in that batch. This is just generally risky because it indicates a situation that was unexpected and ti is safer to abort that batch rather than continue into the unknown. This would have mitigated this bug.
- If a batch has an existing CNAME record order that deletion just before the creation of records with the same name. This will cause slight downtime but is the best that is possible without transactional updates. (Do any providers support this?)
Environment:
- External-DNS version (use
external-dns --version):k8s.gcr.io/external-dns/external-dns:v0.12.0 - DNS provider: Cloudflare.
I have same issue. In my case, I can safely remove a- prefixed TXT record manually.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Fuck off bot. Problems don't go away just because you close the ticket.
/remove-lifecycle stale
@kevincox I believe this can now be closed as reordering the operations has been merged in https://github.com/kubernetes-sigs/external-dns/pull/3094#event-7783242311