flannel
flannel copied to clipboard
DNS not configured correctly on a Raspberry Pi cluster
I'm having some trouble setting up Kubernetes with coredns and Flannel, on a cluster of 4x Raspberry Pis. After installing kubeadm on my master node and pulling images, I initialized it with this command:
sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.194
And then I installed Flannel v0.13.0 using this command:
kubectl apply -f https://rawgit.com/coreos/flannel/v0.13.0/Documentation/kube-flannel.yml
So far, so good. It spawns the Flannel daemonset on all nodes (although incidentally, I sometimes have to run sudo ip link delete flannel.1
on each node to get it working), and I can launch containers. However, unfortunately, DNS does not work on my containers. I check the /etc/resolv.conf file and they all point to 10.96.0.10, but this doesn't seem to work. If I kubectl exec
into a running pod and from there run dig google.com
... it just times out. (Whereas if I run dig @8.8.8.8 google.com
it immediately returns a result, so at least I have Internet connectivity! And that narrows it down to a cluster DNS problem.)
I was reading that you have to pass --pod-network-cidr=10.244.0.0/16
during set up in order for Flannel to work. And as far as I can tell it is working. I'm just wondering if there is an additional parameter I'm missing that will get DNS to finally start working as well?
I have same issue.
My environment is
- Raspberry Pi 4 model B * 2 nodes
- ubuntu for raspberry Pi
$ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.2 LTS" --- trancate ---
- kubernetes version is v1.21.1
- using cri-o as container runtime
$ sudo crictl version Version: 0.1.0 RuntimeName: cri-o RuntimeVersion: 1.21.0 RuntimeApiVersion: v1alpha2
- flannel from master branch
-
image: quay.io/coreos/flannel:v0.14.0-rc1
-
- kubeadm init command is
-
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
-
current behavior
dig google.com fails
dig google.com ; <<>> DiG 9.11.6-P1 <<>> google.com ;; global options: +cmd ;; connection timed out; no servers could be reached
dig @8.8.8.8 google.com succeeded
dig @8.8.8.8 google.com google.com. 47 IN A 172.217.26.46
expected behavior
dig google.com works fine.
Due to it, I cannot execute apt-get install foo
or some other commands in pods.
Is there any information to make it work?
In my case because it was a development environment, I ended up turning off all of the firewall rules entirely, using sudo ufw disable
. This meant that every port is open on all of my nodes. So obviously this is not an approach that would work for a production environment. But I never did figure out which internal port Kubernetes/Flannel is using to handle DNS resolution. I can tell you that it's not using 53, as adding firewall rules for that port specifically had no effect.
@soapergem the coredns pod probably should log something why it failed when firewall is on, and iptables trace can be your friend to trace firewall problem, https://youtu.be/9HNKRP7x57M
Finally, I resolved this issue with three steps.
-
Add
--resolv-conf=/run/systemd/resolve/resolv.conf
toKUBELET_EXTRA_ARGS
- Ref: kubeadm issue
- kubernetes official doc
- Because my Ubuntu using
systemd-resolved
-
Use
host-gw
mode instead ofvxlan
for flannel- Ref: flannel issue
- I don't know why this is needed.
After these, but I cannot still access CoreDNS. Next final step fixes all issue.
-
Add route to worker nodes for accessing IPs on control plane node.
- ex.
sudo ip route add 10.96.0.0/16 via $CONTROL_PLANE_NODE_IP dev eth0
- 10.96.0.0/16 is serviceSubnet on my cluster.
- Before it, I cannot found route to 10.96.0.0/16 by
ip route
on my worker node. But I can find it now, and accessible.
- ex.
Thanks for all of your supports.
FYI: I'm using flannel v0.13.0 instead of v0.14.0-rc1 now.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.