[BUG] non standard cluster.local domain in microshift breaks internal cluster DNS
What happened:
For example, under a microshift.home cluster domain, resolution of services like service.namespace.svc.microshift.home fails.
What you expected to happen:
Resolution to work.
How to reproduce it (as minimally and precisely as possible):
Install microshift with a domain different to cluster.local
Anything else we need to know?:
It can be fixed by updating the dns-default configmap to look up for $DOMAIN in the kubernetes section instead of cluster.local
workaround
For now stick to the default cluster domain name, or edit the dns-default configmap to replace the domain name.
I confirm the problem and the workaround:
data:
Corefile: |
.:5353 {
bufsize 512
errors
health {
lameduck 20s
}
ready
kubernetes in-addr.arpa ip6.arpa MY_CUSTOM_DOMAIN_HERE {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus 127.0.0.1:9153
forward . /etc/resolv.conf {
policy sequential
}
cache 900 {
denial 9984 30
}
reload
}
kind: ConfigMap
metadata:
name: dns-default
namespace: openshift-dns
after hours trying to understand what I was doing wrong with Services, I found out that it wasn't exactly my fault 🙃
for information, this ^^^ has the negative side effect of breaking the domain name resolution from inside the Pods, which is required for instance by openshift-acme to pre-validate the the domain certificate challenge is well configured ...
long story short, I removed the domain configuration from /etc/microshift/config.yaml, restarted microshift, deleted my Pods relying on services, and I think everything is back to normal now!
I run into this issue as well. It seems a bit weird that the same .cluster.domain field form the configuration file ends up being used for both:
- Internal cluster domain for the internal DNS used for communication between pods
- The external cluster DNS name used the OpenShift Router
These are not the same (at least in my case) - so I can get only one of them working at a time.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.