flannel icon indicating copy to clipboard operation
flannel copied to clipboard

DNS not configured correctly on a Raspberry Pi cluster

Open soapergem opened this issue 3 years ago • 4 comments

I'm having some trouble setting up Kubernetes with coredns and Flannel, on a cluster of 4x Raspberry Pis. After installing kubeadm on my master node and pulling images, I initialized it with this command:

sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.194

And then I installed Flannel v0.13.0 using this command:

kubectl apply -f https://rawgit.com/coreos/flannel/v0.13.0/Documentation/kube-flannel.yml

So far, so good. It spawns the Flannel daemonset on all nodes (although incidentally, I sometimes have to run sudo ip link delete flannel.1 on each node to get it working), and I can launch containers. However, unfortunately, DNS does not work on my containers. I check the /etc/resolv.conf file and they all point to 10.96.0.10, but this doesn't seem to work. If I kubectl exec into a running pod and from there run dig google.com... it just times out. (Whereas if I run dig @8.8.8.8 google.com it immediately returns a result, so at least I have Internet connectivity! And that narrows it down to a cluster DNS problem.)

I was reading that you have to pass --pod-network-cidr=10.244.0.0/16 during set up in order for Flannel to work. And as far as I can tell it is working. I'm just wondering if there is an additional parameter I'm missing that will get DNS to finally start working as well?

soapergem avatar Nov 19 '20 03:11 soapergem

I have same issue.

My environment is

  • Raspberry Pi 4 model B * 2 nodes
  • ubuntu for raspberry Pi

$ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.2 LTS" --- trancate ---

  • kubernetes version is v1.21.1
  • using cri-o as container runtime

$ sudo crictl version Version: 0.1.0 RuntimeName: cri-o RuntimeVersion: 1.21.0 RuntimeApiVersion: v1alpha2

  • flannel from master branch
    • image: quay.io/coreos/flannel:v0.14.0-rc1
  • kubeadm init command is
    • sudo kubeadm init --pod-network-cidr=10.244.0.0/16

current behavior

dig google.com fails

dig google.com ; <<>> DiG 9.11.6-P1 <<>> google.com ;; global options: +cmd ;; connection timed out; no servers could be reached

dig @8.8.8.8 google.com succeeded

dig @8.8.8.8 google.com google.com. 47 IN A 172.217.26.46

expected behavior

dig google.com works fine.

Due to it, I cannot execute apt-get install foo or some other commands in pods. Is there any information to make it work?

pinfort avatar May 16 '21 14:05 pinfort

In my case because it was a development environment, I ended up turning off all of the firewall rules entirely, using sudo ufw disable. This meant that every port is open on all of my nodes. So obviously this is not an approach that would work for a production environment. But I never did figure out which internal port Kubernetes/Flannel is using to handle DNS resolution. I can tell you that it's not using 53, as adding firewall rules for that port specifically had no effect.

soapergem avatar May 16 '21 15:05 soapergem

@soapergem the coredns pod probably should log something why it failed when firewall is on, and iptables trace can be your friend to trace firewall problem, https://youtu.be/9HNKRP7x57M

vincentmli avatar May 16 '21 19:05 vincentmli

Finally, I resolved this issue with three steps.

  • Add --resolv-conf=/run/systemd/resolve/resolv.conf to KUBELET_EXTRA_ARGS

  • Use host-gw mode instead of vxlan for flannel

    After these, but I cannot still access CoreDNS. Next final step fixes all issue.

  • Add route to worker nodes for accessing IPs on control plane node.

    • ex. sudo ip route add 10.96.0.0/16 via $CONTROL_PLANE_NODE_IP dev eth0
    • 10.96.0.0/16 is serviceSubnet on my cluster.
    • Before it, I cannot found route to 10.96.0.0/16 by ip route on my worker node. But I can find it now, and accessible.

Thanks for all of your supports.

FYI: I'm using flannel v0.13.0 instead of v0.14.0-rc1 now.

pinfort avatar May 20 '21 17:05 pinfort

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 25 '23 22:01 stale[bot]