Problem Statement

Configuring DNS correctly can be challenging. Understanding how DNS works with kops is documented, but I think we can get a lot of tribal knowledge documented as well. Also, mention that kops gossip is an option to not use DNS.

Purpose

Document how to diagnosis and troubleshoot DNS configuration with Route53 and Google Cloud DNS.

Components involved

DNS / Cloud

Domain Registrars
DNS Providers - Route53 and Google Cloud

kops ecosystem

kops - creates placeholder DNS entries with the 203 IP address
protokube - configures etcd DNS records
dns-controller k8s deployment - configures cluster API endpoint DNS record

Validating Provider setup

End of the day dig ns subdomain.example.com this has to work. The Route53 or google cloud ids may need to be used with kops.

how kops does DNS

flags on kops to set domain
user runs kops and route53 domain entries are created
master node(s) are created and protokube container is started
protokube creates dns records for etcd
once the k8s cluster master(s) are stable dns-controller deployment is started
dns-controller deployment starts and updates api endpoint DNS record

Diagnosis Tools

Cloud consoles - Route53 and Google Cloud
dig
logs from protokube
logs from dns-controller
aws cli
gcloud cli

kops current documentation

aws tutorial https://github.com/kubernetes/kops/blob/master/docs/aws.md#configure-dns

dns-controller documentation https://github.com/kubernetes/kops/blob/9c1ef822ab9766091491826bcdea162261bc3bdd/dns-controller/README.md

creating a sub-domain https://github.com/kubernetes/kops/blob/master/docs/creating_subdomain.md

external documentation

http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/CreatingNewSubdomain.html
https://cloud.google.com/dns/quickstart
https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
https://aws.amazon.com/route53/faqs/

Related Issues / Comments / PRs

https://github.com/kubernetes/kops/issues/762#issuecomment-257365957
https://github.com/kubernetes/kops/issues/1230
https://github.com/kubernetes/kops/issues/3273
https://github.com/kubernetes/kops/issues/1386

Lastly a PR that got closed

https://github.com/justinsb/kops/blob/a09edc22d6b3a070f828e2e69ac9d4bde0cfe534/docs/tour/dns.md

Gaps

We do not have ANY documentation for google DNS.

Nov 17 '17 22:11 chrislovecnm

/area dns /area documentation

Nov 17 '17 22:11 chrislovecnm

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Feb 15 '18 22:02 fejta-bot

/lifecycle frozen /remove-lifecycle stale

Feb 16 '18 01:02 chrislovecnm

Problem: I was trying to set up kubernetes using kops on AWS and got the following error:

unexpected error during validation: unable to resolve Kubernetes cluster API URL dns: lookup example.com on 127.0.0.53:53: server misbehaving

My A records started with 203.0.113.123, the kubernetes placeholder record, after ~10 minutes all my A records were updates except from the two first:

Route53 IP addresses: api.example.com: 203.0.113.123 api.internal.example.com: 203.0.113.123 etcd-a.internal.example.com: 172.20.39.255 etcd-b.internal.example.com: 172.20.68.11 ...

Root problem: Route53 configured my Hosted zones and Registered domains with different name servers.

Solution: Change the four name servers in your Registered domains to match your root domain name in the Hosted zones, e.g., ns-25.awsdns-1.com, ns-12.awsdns-09.co.uk, ns-12.awsdns-09.net, ns-67.awsdns-88.org.

This takes about 3 hours. You can see the progress at https://dnschecker.org/ or use the command line "dig ns example.com". Once it's done, delete the cluster and create it again. You will see the same error message the first 10 minutes. But after 10min you'll see the A records update for api.example.com and api.internal.example.com.

Note: You will also see this error even if you have the correct name servers. If you have the correct name servers, wait 10-15 min and it will work.

Oct 31 '18 13:10 emilwallner

I have nearly the same problem, but I couldn't find a solution for it and I'm not sure what is the cause for it

Here are the steps that I have used:

I have created a cluster using the following cli

kops create cluster \
--cloud=aws \
--node-count=2 \
--node-size=t2.medium \
--zones=eu-west-1a \
--master-size=t2.medium \
--master-zones=eu-west-1a \
--dns-zone=k8s.domain.com \
--name=dev.k8s.domain.com \
--topology=private \
--networking=weave \
--cloud-labels="Env=Dev" \
--state=s3://domain-dev-kops-state-store \
--ssh-public-key=~/.ssh/id_rsa.pub \
--yes

I have created 2 hosted zones on AWS Route53: domain.com . ---> Parent domain k8s.domain.com ---> child domain --> cluster domain with the NS from subdomain added to the parent domain

The cluster is up, but any resources within the cluster that is exposed as a loadbalancer type the URL given by the cluster something like a5df3af45733411e99464025095a7ed8-300168090.eu-west-1.elb.amazonaws.com doesn't work and gives page not found

I have tried the above steps multiple times without success, any advice for the above?

May 11 '19 09:05 engmsaleh

/assign

May 11 '20 06:05 olemarkus

Document Route53 and Google DNS troubleshooting

Problem Statement

Purpose

Components involved

DNS / Cloud

kops ecosystem

Validating Provider setup

how kops does DNS

Diagnosis Tools

kops current documentation

external documentation

Related Issues / Comments / PRs

Gaps

Here are the steps that I have used: