flannel
flannel copied to clipboard
kubectl install: Failed to create SubnetManager: ... getsockopt: network is unreachable
I'm following the steps as detailed https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network including:
- adding --pod-network-cidr=10.244.0.0/16 to the kubeadm config
- set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 for all nodes
- kubectl apply with your latest kube-flannel.yml
"kubectl get pods -n kube-system" then shows kube-flannel-ds-* pods in a CrashLoopBackOff restart loop and kube-dns is still waiting to start.
"kubectl logs -n kube-system kube-flannel-ds-4kkbh -c kube-flannel" shows main.go finding the node's external interface and IP and then: E0619 21:21:11.983684 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-4kkbh': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-4kkbh: dial tcp 10.96.0.1:443: getsockopt: network is unreachable
Found a troublshooting guide that includes checking for flannel binaries at /opt/cni/bin and config files at /etc/cni/net.d, they're all there.
It also suggests checking ifconfig for the "real" interface, the docker interface and one for flannel.
lo, eth0 and docker0 are there but nothing flannel-related.
What am I missing to install and run flannel on an HA kubernetes cluster?
Your Environment
- Flannel version: flannel:v0.10.0-amd64
- Backend used (e.g. vxlan or udp): vxlan
- Etcd version: 3.1.12 (but I installed it for k8s, is flannel using it by default or is flannel using the K8S API?)
- Kubernetes version (if used): 1.10.4
- Operating System and version: Ubuntu 18.04 LTS
- Link to your project (optional):
I had the same symptoms and it was caused by my machine not having a default route set. You can see if you have a default route by running:
route
add checking that default
or 0.0.0.0
is present. If not run:
route add default gw <gateway ip> netmask 0.0.0.0
I don't understand why I had to do this because I thought flannel was supposed to configure the routing table. Can anyone explain?
It turned out I was trying to use kubectl apply to make changes to the flannel install but was failing to overwrite an earlier wrong subnet entry. Ripped out flannel and reinstalled with the correct entry and made progress.
Still not there yet, however. All appears to be well, and traffic is routing among pods for the pod subnet and the real world. But the pods and nodes don't have any routing entries for the service subnet, nodes don't have any virtual IPs on the service subnet, and pods can't reach things like kube-dns.
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth0
your yaml file try to set this - --iface=eth0
I've made some more progress but it's clear there are things about kubernetes networking I don't understand, and I'm still not working 100%.
From a busybox pod I was unable to use nslookup to resolve any hostnames from my on-premise DNS or the outside world. Instead it would return a cluster.local name for the kube-dns pod and a 10.96 IP. But I eventually realized if it was reporting a PTR lookup for the nameserver it had to be reaching a nameserver somehow. The problem was that nameserver wasn't forwarding lookups for anything outside cluster.local.
I added a configmap:
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
data:
upstreamNameservers: |-
["x.x.x.x", "y.y.y.y", "z.z.z.z"]
with the IPs of my on-premise DNS servers. Now nslookup on my pod resolves hostnames in my own zones and outside names like google.com.
But for every lookup there's almost exactly a 20s pause, 20 plus a few milliseconds. For google.com I get both an IPv4 and IPv6 address returned, and it takes 40s.
This smells like kube-dns is timing out and then forwarding the lookup. Where is that 20s timeout defined?
And a separate question, how is traffic reaching the 10.96 service subnet and kube-dns with no interfaces or routing entries anywhere?
In my case, I need to set the isDefaultGateway
to true
in the cni-conf.json
file to make the pods to visit the internet.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.