bottlerocket
bottlerocket copied to clipboard
1.28 os image has a conflicting DNS server running on port 53 preventing node-local-dns from running
Platform I'm building on: x86_x64, amd
What I expected to happen: Should allow running host port services like node-local-dns without a conflict
What actually happened: node local dns crashes with
Listen: listen tcp 0.0.0.0:53: bind: address already in use
How to reproduce the problem: run bottlerocket-aws-k8s-1.28-x86_64-v1.17.0-53f322c2 image and node-local-dns from delivery hero chart
previous 1.27 bottlerocket worked fine switching from bottlerocket to AL2 works wityh 1.28
Hello @ami-descope, thanks for opening this issue. The relevant difference between k8s 1.27 and k8s 1.28 on Bottlerocket is that we moved from wicked
to systemd-networkd
which includes using systemd-resolved
(#3366). I believe resolved is creating this conflict. I'll need to take a deeper look but you might need to see if you can work around this by using a different port than 53 or reusing resolved. Can you post more of your node-local-dns config so I can try and replicate your error and see if I can find a workaround.
FWIW most search results for this error related to systemd-resolved
are going to say "disable systemd-resolved" but that isn't really a great solution. We are using resolved for DNS resolution so stopping this may cause more problems.
I've done some more investigation and following some basic guidance on node local dns I don't see the same issue as you are seeing on 1.28. This might be due to your specific configuration around what you are binding on. You might need to be more restrictive than binding on 0.0.0.0
for port 53. If you can confirm what binding you are using @ami-descope that might help us figure out what the issue is.
Hi @yeazelm sorry for missing this
my config for the node local dns is (values.yaml)
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate
config:
localDns: 169.254.20.25
prometheusScraping:
enabled: false
i don't do any binds
here is my entire chart rendered:
---
# Source: node-local-dns/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: sandbox-euw1-node-local-dns-72376ac0
namespace: kube-system
labels:
helm.sh/chart: node-local-dns-2.0.3
app.kubernetes.io/name: node-local-dns
app.kubernetes.io/instance: sandbox-euw1-node-local-dns-72376ac0
k8s-app: sandbox-euw1-node-local-dns-72376ac0
app.kubernetes.io/version: "1.22.23"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: "node-local-dns"
app.kubernetes.io/created-by: "node-local-dns"
app.kubernetes.io/part-of: "node-local-dns"
---
# Source: node-local-dns/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: sandbox-euw1-node-local-dns-72376ac0
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
helm.sh/chart: node-local-dns-2.0.3
app.kubernetes.io/name: node-local-dns
app.kubernetes.io/instance: sandbox-euw1-node-local-dns-72376ac0
k8s-app: sandbox-euw1-node-local-dns-72376ac0
app.kubernetes.io/version: "1.22.23"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: "node-local-dns"
app.kubernetes.io/created-by: "node-local-dns"
app.kubernetes.io/part-of: "node-local-dns"
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 0.0.0.0
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 0.0.0.0
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 0.0.0.0
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 0.0.0.0
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}
---
# Source: node-local-dns/templates/service-upstream.yaml
apiVersion: v1
kind: Service
metadata:
name: sandbox-euw1-node-local-dns-72376ac0-upstream
namespace: kube-system
labels:
helm.sh/chart: node-local-dns-2.0.3
app.kubernetes.io/name: node-local-dns
app.kubernetes.io/instance: sandbox-euw1-node-local-dns-72376ac0
k8s-app: sandbox-euw1-node-local-dns-72376ac0
app.kubernetes.io/version: "1.22.23"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: "node-local-dns"
app.kubernetes.io/created-by: "node-local-dns"
app.kubernetes.io/part-of: "node-local-dns"
spec:
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
---
# Source: node-local-dns/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: sandbox-euw1-node-local-dns-72376ac0
namespace: kube-system
labels:
helm.sh/chart: node-local-dns-2.0.3
app.kubernetes.io/name: node-local-dns
app.kubernetes.io/instance: sandbox-euw1-node-local-dns-72376ac0
k8s-app: sandbox-euw1-node-local-dns-72376ac0
app.kubernetes.io/version: "1.22.23"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: "node-local-dns"
app.kubernetes.io/created-by: "node-local-dns"
app.kubernetes.io/part-of: "node-local-dns"
spec:
clusterIP: None
ports:
- name: metrics
port: 9253
targetPort: 9253
selector:
k8s-app: sandbox-euw1-node-local-dns-72376ac0
---
# Source: node-local-dns/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sandbox-euw1-node-local-dns-72376ac0
namespace: kube-system
labels:
helm.sh/chart: node-local-dns-2.0.3
app.kubernetes.io/name: node-local-dns
app.kubernetes.io/instance: sandbox-euw1-node-local-dns-72376ac0
k8s-app: sandbox-euw1-node-local-dns-72376ac0
app.kubernetes.io/version: "1.22.23"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: "node-local-dns"
app.kubernetes.io/created-by: "node-local-dns"
app.kubernetes.io/part-of: "node-local-dns"
spec:
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
selector:
matchLabels:
k8s-app: node-local-dns
template:
metadata:
labels:
k8s-app: node-local-dns
annotations:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate
priorityClassName: system-node-critical
serviceAccountName: sandbox-euw1-node-local-dns-72376ac0
hostNetwork: true
dnsPolicy: Default # Don't use cluster DNS.
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- effect: "NoExecute"
operator: "Exists"
- effect: "NoSchedule"
operator: "Exists"
containers:
- name: node-cache
image: "registry.k8s.io/dns/k8s-dns-node-cache:1.22.23"
resources:
limits:
memory: 128Mi
requests:
cpu: 25m
memory: 128Mi
args:
- "-localip"
- "169.254.20.25,172.20.0.10"
- "-conf"
- "/etc/Corefile"
- "-upstreamsvc"
- "sandbox-euw1-node-local-dns-72376ac0-upstream"
- "-skipteardown=false"
- "-setupinterface=true"
- "-setupiptables=true"
securityContext:
capabilities:
add:
- NET_ADMIN
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9253
name: metrics
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
volumeMounts:
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- name: config-volume
mountPath: /etc/coredns
- name: kube-dns-config
mountPath: /etc/kube-dns
volumes:
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: kube-dns-config
configMap:
name: sandbox-euw1-node-local-dns-72376ac0
optional: true
- name: config-volume
configMap:
name: sandbox-euw1-node-local-dns-72376ac0
items:
- key: Corefile
path: Corefile.base
From looking at this configuration, I'm pretty sure its the bind 0.0.0.0
lines in the Corefile config. The main problem is that systemd-resolved is already listening on the loopback interface so 0.0.0.0 will conflict. Is there a way you can scope these binds down to not include 0.0.0.0
and still get your node local dns to work (I admit I'm still ramping up on how this is used so I don't quite have a working setup to confirm this myself)? We might also be able to use an exclude to avoid the conflict but I haven't found the exact format to make that work.
It's supposed to be a DNS cache server that nodes are configured via kubelet to get to that address (no port config)
Queries are first goes through the local DNS cache if it's a hit response returned If it's a miss, node local DNS goes to core DNS to fetch the values and cache those.
Since it's dns server I don't think I can run it on another port other than 53, I think resolve.conf only has DNS address.
On Mon, Jan 22, 2024, 8:01 PM Matthew Yeazel @.***> wrote:
From looking at this configuration, I'm pretty sure its the bind 0.0.0.0 lines in the Corefile config. The main problem is that systemd-resolved is already listening on the loopback interface so 0.0.0.0 will conflict. Is there a way you can scope these binds down to not include 0.0.0.0 and still get your node local dns to work (I admit I'm still ramping up on how this is used so I don't quite have a working setup to confirm this myself)? We might also be able to use an exclude https://coredns.io/plugins/bind/ to avoid the conflict but I haven't found the exact format to make that work.
— Reply to this email directly, view it on GitHub https://github.com/bottlerocket-os/bottlerocket/issues/3711#issuecomment-1905105272, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7HNSL4ACKXDZJ5MCJTDL4TYP4DU5AVCNFSM6AAAAABBWVBHHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVGEYDKMRXGI . You are receiving this because you were mentioned.Message ID: @.***>
Thanks for the clarification @ami-descope. I think I have the right understanding for the local DNS caching. If believe the docs call out that the preference for what to bind to is a local scope address:
It's recommended to use an address with a local scope, for example, from the 'link-local' range '169.254.0.0/16' for IPv4 or from the 'Unique Local Address' range in IPv6 'fd00::/8'.
I'm not sure if you can set that via the helm chart you are using but when I set this to 169.254.0.1
on my aws-k8s-1.28 instance, the node local DNS did seem to be working (but I didn't confirm every use case so that is why I asked if there is something I might be missing). Can you see if using a specific local scope address works for your configuration?
But the issue is that the port is taken no?
But the issue is that the port is taken no?
The port is only taken for the loopback interface which conflicts with 0.0.0.0 which attempts to bind on all interfaces. So if you use ss
to look at bound ports (specifically :53) for a default configuration on Bottlerocket running systemd-resolved (currently aws-k8s-1.28+ and aws-ecs-2), you get the following:
ss -tulpn | grep :53
udp UNCONN 0 0 127.0.0.54:53 0.0.0.0:* users:(("systemd-resolve",pid=1516,fd=15))
udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=1516,fd=13))
tcp LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=1516,fd=14))
tcp LISTEN 0 4096 127.0.0.54:53 0.0.0.0:* users:(("systemd-resolve",pid=1516,fd=16))
If you avoid attempting to bind on those addresses (127.0.0.1/8) then it should work.
Just to confirm, I have been running with an explicit locally scoped address over 5 clusters for the past year or so and have had no issues. As per the recommendation, I chose 169.254.20.10
in the link-local range.
We're not running 1.28 as yet, but was lurking on this issue as that upgrade is just around the corner.
The config was generated using the docs sed
method...
kubedns=`kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}`
domain=cluster.local
localdns=169.254.20.10
sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__/$kubedns/g" nodelocaldns.yaml
Our full config for clarity...
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 169.254.20.10 172.16.0.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.16.0.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.16.0.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 172.16.0.10
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}
Thanks for the extra info @theplatformer, I think this confirms that the local address works and I don't anticipate issues with upgrading to k8s 1.28 and beyond while binding on addresses such as 169.254.20.10
since it won't conflict with the systemd-resolved
binding.
But the issue is that the port is taken no?
My local DNS address is 169.254.20.25
Is that conflicting with resolvd?
But the issue is that the port is taken no?
My local DNS address is 169.254.20.25
Is that conflicting with resolvd?
The problem is that the actual config ends up trying to bind on 0.0.0.0 rather than just 169.254.20.25
so the helm chart you are using appears to not be taking that input and changing the bind configuration around it.
From your posted configuration:
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 0.0.0.0. <---- This is the problem config
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 0.0.0.0. <---- This is the problem config
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 0.0.0.0. <---- This is the problem config
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 0.0.0.0. <---- This is the problem config
forward . __PILLAR__UPSTREAM__SERVERS__
prometheus :9253
}
---
You might try editing this in place to replace 0.0.0.0
with 169.254.20.25
via kubectl edit
and if that works, then it might be worth following up with the helm chart owners to see if they can support this change.
Hi, @yeazelm thanks in advance.
Does systemd-resolved
serve the same purpose as node-local-dns
in caching pod DNS queries?
Any docs about how the systemd-resolved
worked?
Hi, @yeazelm thanks in advance. Does
systemd-resolved
serve the same purpose asnode-local-dns
in caching pod DNS queries? Any docs about how thesystemd-resolved
worked?
systemd-resolved
is for host resolution where as node-local-dns sets up a DNS cache server as a DaemonSet intended to handle DNS resolving for the orchestrated containers. The configs mentioned in this thread are running a DNS cache server in a pod for this purpose, the problem is when one tries to bind to port 53 for all addresses, this conflicts with the existing host DNS resolver. Ideally, you shouldn't need to think about systemd-resolved
at all on Bottlerocket. Nonetheless, https://www.freedesktop.org/software/systemd/man/latest/systemd-resolved.service.html is a good place to start to understand how systemd-resolved
works.
@ami-descope What Helm chart are you using because there is no official one on project GitHub here, just the YAML here? If you are using a Helm chart from any source you might create a PR to incorporate a change making the bind address configurable with a good default of link-local address 169.254.20.10 as an IPv4 default plus the CoreDNS service "kube-dns" IP
It seems to work for me after using the most popular but unofficial Helm Chart for node-local-dns and setting the value bindIp: true
.
This value was introduced because of this issue. See this PR: https://github.com/deliveryhero/helm-charts/pull/563