external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

Creating dns records for apex / root domains

Open soosap opened this issue 7 years ago • 73 comments

Not sure if this is a bug or just not possible by design or other limitations.

I am trying to automatically create a dns record for the apex / root domain. Unfortunately, those records are not being created. Manually, it's possible to point the apex to a AWS ELB. I also do not get any feedback from the ExternalDNS pod except: time="2018-01-26T09:07:31Z" level=info msg="All records are already up to date".

Can you clarify how I could achieve this? I also have not seen any examples where the apex is being set through ExternalDNS.

soosap avatar Jan 26 '18 12:01 soosap

I'm seeing something similar with the CloudFlare provider. I have an ingress defined with two rule sets, like this:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: ignota.media
  name: rupertsberg
  namespace: prod-green
spec:
  rules:
    - host: ignota.media
      http:
        paths:
          - backend:
              serviceName: rupertsberg
              servicePort: http
    - host: www.ignota.media
      http:
        paths:
          - backend:
              serviceName: rupertsberg
              servicePort: http
  tls:
    - secretName: ignota-media-ssl
      hosts:
        - ignota.media
        - www.ignota.media

...and it looks like the External DNS watcher detects them both, based on a dry debug run:

/ # external-dns --source=ingress --provider=cloudflare --cloudflare-proxied --once --dry-run --metrics-address=:9797 --log-level=debug
INFO[0000] config: &{Master: KubeConfig: Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: Compatibility: PublishInternal:false Provider:cloudflare GoogleProject: DomainFilter:[] AWSZoneType: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:true InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InMemoryZones:[] Policy:sync Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:true DryRun:true LogFormat:text MetricsAddress::9797 LogLevel:debug}
INFO[0000] running in dry-run mode. No changes to DNS records will be made.
INFO[0000] Connected to cluster at https://100.64.0.1:443
DEBU[0002] Endpoints generated from ingress: prod-green/rupertsberg: [ignota.media 0 IN CNAME af9f52be105f911e89e8306637a410d9-1940002126.us-east-1.elb.amazonaws.com www.ignota.media 0 IN CNAME af9f52be105f911e89e8306637a410d9-1940002126.us-east-1.elb.amazonaws.com]

In CloudFlare, however, only the www subdomain shows up.

Adding the apex domain manually isn't the worst thing, but it'd be great if you folks had any suggestions! 😬

phyllisstein avatar Jan 31 '18 00:01 phyllisstein

With SimpleDNS we register .mydomain.example.com. which creates, but somehow cant update. This seems more related to the driver itself though, or it's a fluke it works at all :)

time="2018-04-13T22:40:26Z" level=debug msg="Endpoints generated from service: example/example-ing-public-traefik: [.exampleapp.nl 60 IN A 1.2.3.4]" 
time="2018-04-13T22:40:26Z" level=debug msg="No endpoints could be generated from service example/webapp" 
time="2018-04-13T22:40:26Z" level=debug msg="No endpoints could be generated from service example/zipkin" 
time="2018-04-13T22:40:26Z" level=debug msg="No endpoints could be generated from ingress example/public" 
time="2018-04-13T22:40:26Z" level=info msg="Changing records: UPDATE {0  0 A .exampleapp.nl 1.2.3.4 0 0 false []  } in zone: exampleapp.nl" 
time="2018-04-13T22:40:27Z" level=error msg="PATCH https://api.dnsimple.com/v2/123456/zones/exampleapp.nl/records/123456123456: 400 System records cannot be updated" 

The-Loeki avatar Apr 13 '18 22:04 The-Loeki

GKE k8s v1.9.6-gke.1 nginx-ingress v0.14.0 cert-manager v0.2.5 external-dns v0.5.0

Noticed the same in Google CloudDNS! Apex domains are not auto set up.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    certmanager.k8s.io/acme-challenge-type: dns01
    certmanager.k8s.io/acme-dns01-provider: clouddns
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
    kubernetes.io/ingress.class: nginx
  labels:
    app: website
    chart: static-website-0.1.0
    heritage: Tiller
    release: website
  name: website
  namespace: default
spec:
  rules:
  - host: 0x.se
    http:
      paths:
      - backend:
          serviceName: website
          servicePort: 80
        path: /
  - host: www.0x.se
    http:
      paths:
      - backend:
          serviceName: website
          servicePort: 80
        path: /
  tls:
  - hosts:
    - 0x.se
    - www.0x.se
    secretName: website-prod
$ kubectl logs -l app=external-dns -n kube-system

time="2018-05-08T21:04:53Z" level=info msg="All records are already up to date"
time="2018-05-08T21:05:54Z" level=info msg="Change zone: default-zone-0xse"
time="2018-05-08T21:05:54Z" level=info msg="Add records: www.0x.se. A [35.195.241.48] 300"
time="2018-05-08T21:05:54Z" level=info msg="Add records: www.0x.se. TXT [\"heritage=external-dns,external-dns/owner=extdns-main-cluster,external-dns/resource=ingress/default/website\"] 300"
time="2018-05-08T21:06:55Z" level=info msg="All records are already up to date"

:(

mandrean avatar May 08 '18 21:05 mandrean

Same on AWS with v0.5.0, maybe caused by the APEX having NS and SOA records set? Weird that debug doesn't say anything about it though.

joekohlsdorf avatar May 15 '18 16:05 joekohlsdorf

On a whim, I tried setting the external-dns.alpha.kubernetes.io/hostname annotation to @ (which CloudFlare uses to signify the apex domain), but no dice.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    # No bueno.
    external-dns.alpha.kubernetes.io/hostname: '@'

phyllisstein avatar May 17 '18 15:05 phyllisstein

ran into the same issue. with the log entries:

time="2018-08-07T15:39:09Z" level=debug msg="Endpoints generated from ingress: default/xxx: [my.domain 0 IN A xxx.xxx.xxx.xxx my.domain 0 IN A xxx.xxx.xxx.xxx]"
time="2018-08-07T15:39:09Z" level=debug msg="Removing duplicate endpoint my.domain 0 IN A xxx.xxx.xxx.xxx"
time="2018-08-07T15:39:09Z" level=debug msg="Skipping endpoint my.domain 0 IN A xxx.xxx.xxx.xxx because owner id does not match, found: \"\", required: \"default\""
time="2018-08-07T15:39:09Z" level=debug msg="Skipping endpoint my.domain 0 IN TXT \"google-site-verification=abcde\" because owner id does not match, found: \"\", required: \"default\""
time="2018-08-07T15:39:09Z" level=info msg="All records are already up to date"

env:

  • GKE 1.10.5-gke.3
  • nginx-ingress 0.23.0(chart) 0.15.0(app)
  • external-dns 0.7.1(chart) v0.5.4(app)

ZigZagT avatar Aug 07 '18 15:08 ZigZagT

I have the same issue - trying to write to my root domain pfefferundfrost.com with DNSimple provider -resulting in a strange error: External-dns is writing the A record for pfefferundfrost.com.pfefferundfrost.com and complaining later:

time="2018-11-05T20:44:44Z" level=info msg="Changing records: CREATE {0 0 A pfefferundfrost.com 176.9.XXX.XXX 3600 0 false [] } in zone: pfefferundfrost.com"
5.11.2018 21:44:44 time="2018-11-05T20:44:44Z" level=error msg="POST https://api.dnsimple.com/v2/71763/zones/pfefferundfrost.com/records: 400 Zone record already exists"

How to fix this?

JannikZed avatar Nov 05 '18 20:11 JannikZed

Not sure when this landed---there's nothing of note in the CHANGELOG---but this appears to be fixed as of v0.5.9-15-gf25f90db (which I'm pulling into my cluster from the latest tag).

An ingress that looks like this will now correctly assign a record to the apex in CloudFlare:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: site
spec:
  rules:
    - host: site.tld
      http:
        paths:
          - backend:
              serviceName: site
              servicePort: http
    - host: www.site.tld
      http:
        paths:
          - backend:
              serviceName: site
              servicePort: http

phyllisstein avatar Dec 03 '18 04:12 phyllisstein

It is indeed fixed as @phyllisstein said but it raises another issue : apex are usually associated with a lot of TXT records (google verification, spf, and other stuff) which seem to prevent external-dns to create its own entries :

time="2018-12-12T15:28:03Z" level=info msg="Change zone: REDACTED"
time="2018-12-12T15:28:03Z" level=info msg="Add records: REDACTED A [REDACTED] 300"
time="2018-12-12T15:28:03Z" level=info msg="Add records: REDACTED TXT [\"heritage=external-dns,external-dns/owner=REDACTED,external-dns/resource=ingress/REDACTED/REDACTED\"] 300"
time="2018-12-12T15:28:03Z" level=error msg="googleapi: Error 409: The resource 'entity.change.additions[1]' named 'REDACTED (TXT)' already exists, alreadyExists"

Related to #443 #692

sylver avatar Dec 12 '18 15:12 sylver

It is working with www.domain.tld, but not with domain.tld .. Even with the newest version I still get something like A | domain.tld.domain.tld | 176.XX.XX.XX for DNSIMPLE.

JannikZed avatar Dec 12 '18 20:12 JannikZed

Same here for DNSimple with:

external-dns.alpha.kubernetes.io/hostname: .example.nl.,www.example.nl.,.example.be.,www.example.be.

bkleef avatar Dec 14 '18 16:12 bkleef

Any update here ? This is really an issue to not being able to set our domain apex along with other entries because of TXT records. Maybe there is a workaround I can't think of ?

Kind of really busy but if I can help on something to make this go forward, let me know.

sylver avatar Jan 14 '19 10:01 sylver

Tested this again today on Route53 using external-dns v0.5.12 and it works 🎉!

Edit: Only if there is no other TXT record on the apex domain (in my case I have some Google site verification records): time="2019-04-05T18:27:27Z" level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='domain.name.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: 76e454b6-57d0-11e9-9d8e-19e9906e17ea"

joekohlsdorf avatar Apr 05 '19 18:04 joekohlsdorf

Creation of records for root/apex works fine for us, but we're running into the same issue others are reporting re: managing TXT records. We manage our Route 53 records using Terraform, but that isn't working as we're starting to deploy stage/prod services to our new EKS clusters because the TXT record for the root zone already exists because of external-dns and the Terraform apply for other TXT records for root fails.

echoboomer avatar Apr 09 '19 01:04 echoboomer

We solved our issue by using the prefix option. We added it to the values file of the Helm chart as such: txtPrefix: "${prefixName}."

This would generate prefixName.foobar.com, for example.

echoboomer avatar Apr 09 '19 15:04 echoboomer

I'm still having the problem on v0.5.13. My zone has NS and SOA records pointing to example.com (redacted). www.example.com records are created with success but example.com records (part of the same Ingress rule) aren't. Here's the log I'm getting:

{"level":"debug","msg":"Skipping endpoint example.com 0 IN CNAME REDACTED.eu-west-3.elb.amazonaws.com [] because owner id does not match, found: \"\", required: \"k8s-prod-public\"","time":"2019-04-24T13:02:36Z"}

Any help would be highly appreciated :)

elafarge avatar Apr 24 '19 13:04 elafarge

@elafarge that error happens when external-dns skips a record because there's no tag on it signifying that it should be managing it. It will always create a TXT record with a value similar to this:

"heritage=external-dns,external-dns/owner=kube-cluster-name,external-dns/resource=service/namespace/service-name"

If you look at one of the other records that external-dns is managing, you'll see a tag similar to that one, and you'll want the cluster name to match k8s-prod-public which is what it is looking for. You can get around this by manually tagging the records it's complaining about, but removing the existing records and letting it recreate them is also a viable option depending on environment and tolerance for that type of change.

Hopefully that made sense.

echoboomer avatar Apr 24 '19 13:04 echoboomer

@echoboomer We're probably going to do the same. Did you run into any gotcha's with having the TXT prefix being a subzone? (Right now it's just a prefix in the same zone). I can't think of anything that would be an issue but you never know

Evesy avatar Jul 05 '19 13:07 Evesy

We ran into the issue @joekohlsdorf describes:

Having external-dns manage an apex record is no issue in itself. Just on AWS / Route53 you may not simply add another TXT record if one (like one for DKIM / SPF) exists, but apparently you have to "add" yourself to the existing record:

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/ResourceRecordTypes.html#TXTFormat

frittentheke avatar Aug 02 '19 14:08 frittentheke

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Oct 31 '19 15:10 fejta-bot

/remove-lifecycle stale

frittentheke avatar Oct 31 '19 17:10 frittentheke

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jan 30 '20 00:01 fejta-bot

/remove-lifecycle stale

frittentheke avatar Feb 14 '20 06:02 frittentheke

We encountered a issue similar to this one.

We're using a txt-prefix like staging-01- so all our registry TXT records look like: staging-01-test.example.com or staging-01-help.example.com. The problem happens when we try to add an apex DNS entry. Because the code that calculates the TXT entry is https://github.com/kubernetes-sigs/external-dns/blob/5fc6adf36d83dd99e00121399841046001d9b44a/registry/txt.go#L222 it will simply concat prefix and DNS endpoint: staging-01- + example.com That fails since it's missing a proper zone: staging-01-example.com

One solution is to completely change all our prefixes from staging-01- to staging-01. - making them a subdomain. But that might cause issues with current values.

Should the code that generates the TXT entry be fixed to check if it's this edge case? Since every time you do "apex + prefix" this will fail.

apolloFER avatar Feb 24 '20 12:02 apolloFER

Same issue here. Our apex / root domain already has a TXT record for SPF and getting this error:

InvalidChangeBatch: [Tried to create resource record set [name='domain.tld.', type='TXT'] but it already exists]

What if external-dns, when this error happens, would just UPDATE the TXT record and append the owned text to it? Then it's solved, right?

ruudk avatar Apr 08 '20 12:04 ruudk

@ruudk I think that is what the txt-prefix is for, since it is difficult to merge existing records 100% accurately, i.e. without breaking your DNS, and in the case of an SPF TXT record, probably your email as well.

In our case I chose a prefix of -external-dns- because it looked nice (🤦‍♂ )in our DNS control panel and automatically sorted out of the way. But as @apolloFER mentioned, at the apex / root, you end up with -external-dns-example.com which is an illogical name.

So until this is fixed, the answer is probably to use a prefix that ends with a period?

Edit: This has already been confirmed working higher up: https://github.com/kubernetes-sigs/external-dns/issues/449#issuecomment-481298355 Thanks :)

mariusmarais avatar Apr 08 '20 12:04 mariusmarais

I using "bitnami/external-dns:0.5.18-r0" and have the same issue

meodemsao avatar Apr 12 '20 04:04 meodemsao

@mariusmarais Thanks that worked for me :)

ruudk avatar Apr 13 '20 06:04 ruudk

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 12 '20 06:07 fejta-bot

/remove-lifecycle stale

frittentheke avatar Jul 13 '20 16:07 frittentheke