flannel icon indicating copy to clipboard operation
flannel copied to clipboard

kubectl install: Failed to create SubnetManager: ... getsockopt: network is unreachable

Open randyrue opened this issue 6 years ago • 5 comments

I'm following the steps as detailed https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network including:

  • adding --pod-network-cidr=10.244.0.0/16 to the kubeadm config
  • set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 for all nodes
  • kubectl apply with your latest kube-flannel.yml

"kubectl get pods -n kube-system" then shows kube-flannel-ds-* pods in a CrashLoopBackOff restart loop and kube-dns is still waiting to start.

"kubectl logs -n kube-system kube-flannel-ds-4kkbh -c kube-flannel" shows main.go finding the node's external interface and IP and then: E0619 21:21:11.983684 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-4kkbh': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-4kkbh: dial tcp 10.96.0.1:443: getsockopt: network is unreachable

Found a troublshooting guide that includes checking for flannel binaries at /opt/cni/bin and config files at /etc/cni/net.d, they're all there.

It also suggests checking ifconfig for the "real" interface, the docker interface and one for flannel.

lo, eth0 and docker0 are there but nothing flannel-related.

What am I missing to install and run flannel on an HA kubernetes cluster?

Your Environment

  • Flannel version: flannel:v0.10.0-amd64
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.1.12 (but I installed it for k8s, is flannel using it by default or is flannel using the K8S API?)
  • Kubernetes version (if used): 1.10.4
  • Operating System and version: Ubuntu 18.04 LTS
  • Link to your project (optional):

randyrue avatar Jun 19 '18 21:06 randyrue

I had the same symptoms and it was caused by my machine not having a default route set. You can see if you have a default route by running: route add checking that default or 0.0.0.0 is present. If not run: route add default gw <gateway ip> netmask 0.0.0.0

I don't understand why I had to do this because I thought flannel was supposed to configure the routing table. Can anyone explain?

sdedwards avatar Jul 18 '18 12:07 sdedwards

It turned out I was trying to use kubectl apply to make changes to the flannel install but was failing to overwrite an earlier wrong subnet entry. Ripped out flannel and reinstalled with the correct entry and made progress.

Still not there yet, however. All appears to be well, and traffic is routing among pods for the pod subnet and the real world. But the pods and nodes don't have any routing entries for the service subnet, nodes don't have any virtual IPs on the service subnet, and pods can't reach things like kube-dns.

randyrue avatar Jul 18 '18 14:07 randyrue

    command:
    - /opt/bin/flanneld
    args:
    - --ip-masq
    - --kube-subnet-mgr
    - --iface=eth0

your yaml file try to set this - --iface=eth0

pytomtoto avatar Jul 19 '18 01:07 pytomtoto

I've made some more progress but it's clear there are things about kubernetes networking I don't understand, and I'm still not working 100%.

From a busybox pod I was unable to use nslookup to resolve any hostnames from my on-premise DNS or the outside world. Instead it would return a cluster.local name for the kube-dns pod and a 10.96 IP. But I eventually realized if it was reporting a PTR lookup for the nameserver it had to be reaching a nameserver somehow. The problem was that nameserver wasn't forwarding lookups for anything outside cluster.local.

I added a configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  upstreamNameservers: |-
    ["x.x.x.x", "y.y.y.y", "z.z.z.z"]

with the IPs of my on-premise DNS servers. Now nslookup on my pod resolves hostnames in my own zones and outside names like google.com.

But for every lookup there's almost exactly a 20s pause, 20 plus a few milliseconds. For google.com I get both an IPv4 and IPv6 address returned, and it takes 40s.

This smells like kube-dns is timing out and then forwarding the lookup. Where is that 20s timeout defined?

And a separate question, how is traffic reaching the 10.96 service subnet and kube-dns with no interfaces or routing entries anywhere?

randyrue avatar Jul 19 '18 14:07 randyrue

In my case, I need to set the isDefaultGateway to true in the cni-conf.json file to make the pods to visit the internet.

nooop3 avatar Apr 27 '22 08:04 nooop3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 25 '23 20:01 stale[bot]