flannel Missing routes with many nodes on vxlan

Missing routes with many nodes on vxlan

Open svenwltr opened this issue 6 years ago • 9 comments

Expected Behavior

When adding a new instance, it should always add all routes to all existing instances.

Current Behavior

When there are many nodes (> 30), there are occasionally missing routes between some instances.

For each missing route we get this error:

vxlan_network.go:145] AddFDB failed: no buffer space available

Missing route means, that one entry like this is missing on ip route:

10.10.27.0/24 via 10.10.27.0 dev flannel.1 onlink

Possible Solution

Besides fixing the underlying problem on the OS or network settings it might be a good idea to retry such things or even to let flannel fail completely (see Context).

Steps to Reproduce

Set up a autoscaling group with instances which use flannel.
Scale up to 50 nodes without ramping up. I am not sure if the parallel booting it a problem here.
Run journalctl -u flanneld | grep AddFDB on each instance and see some errors. There are around 4 missing routes on that scale.

systemd unit

$ systemctl cat flanneld
# /usr/lib/systemd/system/flanneld.service
[Unit]
Description=flannel - Network fabric for containers (System Application Container)
Documentation=https://github.com/coreos/flannel
After=etcd.service etcd2.service etcd-member.service
Requires=flannel-docker-opts.service

[Service]
Type=notify
Restart=always
RestartSec=10s
TimeoutStartSec=300
LimitNOFILE=40000
LimitNPROC=1048576

Environment="FLANNEL_IMAGE_TAG=v0.9.0"
Environment="FLANNEL_OPTS=--ip-masq=true"
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/lib/coreos/flannel-wrapper.uuid"
EnvironmentFile=-/run/flannel/options.env

ExecStartPre=/sbin/modprobe ip_tables
ExecStartPre=/usr/bin/mkdir --parents /var/lib/coreos /run/flannel
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid
ExecStart=/usr/lib/coreos/flannel-wrapper $FLANNEL_OPTS
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/lib/coreos/flannel-wrapper.uuid

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/flanneld.service.d/40-pod-network.conf
[Service]
Environment="FLANNELD_ETCD_ENDPOINTS=http://0.etcd.k8s.rebuy.loc:2379,http://1.etcd.k8s.rebuy.loc:2379,http://2.etcd.k8s.rebuy.loc:2379"
ExecStartPre=/usr/bin/etcdctl --endpoints=http://0.etcd.k8s.rebuy.loc:2379,http://1.etcd.k8s.rebuy.loc:2379,http://2.etcd.k8s.rebuy.loc:2379 set /coreos.com/network/config \
    '{"Network":"10.10.0.0/16", "Backend": {"Type": "vxlan"}}'

logs

$ journalctl -u flanneld | cat
-- Logs begin at Tue 2018-03-06 09:18:56 CET, end at Tue 2018-03-06 10:46:55 CET. --
Mar 06 09:19:26 localhost systemd[1]: Starting flannel - Network fabric for containers (System Application Container)...
Mar 06 09:19:28 ip-172-20-202-32.eu-west-1.compute.internal rkt[859]: rm: unable to resolve UUID from file: open /var/lib/coreos/flannel-wrapper.uuid: no such file or directory
Mar 06 09:19:28 ip-172-20-202-32.eu-west-1.compute.internal rkt[859]: rm: failed to remove one or more pods
Mar 06 09:19:29 ip-172-20-202-32.eu-west-1.compute.internal etcdctl[932]: {"Network":"10.10.0.0/16", "Backend": {"Type": "vxlan"}}
Mar 06 09:19:29 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/flannel-wrapper.uuid --trust-keys-from-https --mount volume=coreos-notify,target=/run/systemd/notify --volume coreos-notify,kind=host,source=/run/systemd/notify --set-env=NOTIFY_SOCKET=/run/systemd/notify --net=host --volume coreos-run-flannel,kind=host,source=/run/flannel,readOnly=false --volume coreos-etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true --volume coreos-usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true --volume coreos-etc-hosts,kind=host,source=/etc/hosts,readOnly=true --volume coreos-etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true --mount volume=coreos-run-flannel,target=/run/flannel --mount volume=coreos-etc-ssl-certs,target=/etc/ssl/certs --mount volume=coreos-usr-share-certs,target=/usr/share/ca-certificates --mount volume=coreos-etc-hosts,target=/etc/hosts --mount volume=coreos-etc-resolv,target=/etc/resolv.conf --inherit-env --stage1-from-dir=stage1-fly.aci quay.io/coreos/flannel:v0.9.0 -- --ip-masq=true
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: pubkey: prefix: "quay.io/coreos/flannel"
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: key: "https://quay.io/aci-signing-key"
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]:         Quay.io ACI Converter (ACI conversion signing key) <[email protected]>
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/flannel" without fingerprint review.
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Added key for prefix "quay.io/coreos/flannel" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/flannel/bff313cdaa560b16a8987b8f72abf5f6799d33bc"
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading signature:  0 B/473 B
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading signature:  473 B/473 B
Mar 06 09:19:33 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading signature:  473 B/473 B
Mar 06 09:19:34 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading ACI:  0 B/18.4 MB
Mar 06 09:19:34 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading ACI:  8.19 KB/18.4 MB
Mar 06 09:19:34 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: Downloading ACI:  18.4 MB/18.4 MB
Mar 06 09:19:35 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: image: signature verified:
Mar 06 09:19:35 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]:   Quay.io ACI Converter (ACI conversion signing key) <[email protected]>
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.559691     950 main.go:470] Determining IP address of default interface
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.559900     950 main.go:483] Using interface with name eth0 and address 172.20.202.32
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.559912     950 main.go:500] Defaulting external address to interface address (172.20.202.32)
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.559977     950 main.go:235] Created subnet manager: Etcd Local Manager with Previous Subnet: 0.0.0.0/0
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.559984     950 main.go:238] Installing signal handlers
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.567211     950 main.go:348] Found network config - Backend type: vxlan
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.567251     950 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.919205     950 local_manager.go:234] Picking subnet in range 10.10.1.0 ... 10.10.255.0
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.923001     950 local_manager.go:220] Allocated lease (10.10.122.0/24) to current node (172.20.202.32)
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.939560     950 main.go:295] Wrote subnet file to /run/flannel/subnet.env
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.939578     950 main.go:299] Running backend.
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.939878     950 vxlan_network.go:56] watching for new subnet leases
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal systemd[1]: Started flannel - Network fabric for containers (System Application Container).
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:37.943885     950 main.go:391] Waiting for 23h0m0.085901025s to renew lease
Mar 06 09:19:37 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: E0306 08:19:37.952090     950 vxlan_network.go:145] AddFDB failed: no buffer space available
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.665323     950 ipmasq.go:75] Some iptables rules are missing; deleting and recreating rules
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.665356     950 ipmasq.go:97] Deleting iptables rule: -s 10.10.0.0/16 -d 10.10.0.0/16 -j RETURN
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.666565     950 ipmasq.go:97] Deleting iptables rule: -s 10.10.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.667897     950 ipmasq.go:97] Deleting iptables rule: ! -s 10.10.0.0/16 -d 10.10.122.0/24 -j RETURN
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.668949     950 ipmasq.go:97] Deleting iptables rule: ! -s 10.10.0.0/16 -d 10.10.0.0/16 -j MASQUERADE
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.670394     950 ipmasq.go:85] Adding iptables rule: -s 10.10.0.0/16 -d 10.10.0.0/16 -j RETURN
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.672491     950 ipmasq.go:85] Adding iptables rule: -s 10.10.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.758989     950 ipmasq.go:85] Adding iptables rule: ! -s 10.10.0.0/16 -d 10.10.122.0/24 -j RETURN
Mar 06 09:19:38 ip-172-20-202-32.eu-west-1.compute.internal flannel-wrapper[950]: I0306 08:19:38.761328     950 ipmasq.go:85] Adding iptables rule: ! -s 10.10.0.0/16 -d 10.10.0.0/16 -j MASQUERADE

We tried to adjust some sysctl settings, but none of them worked:

net.ipv4.tcp_rmem = 10240 87380 12582912
net.ipv4.tcp_wmem = 10240 87380 12582912

net.ipv4.tcp_rmem = 102400 873800 125829120
net.ipv4.tcp_wmem = 102400 873800 125829120

net.core.wmem_max = 125829120
net.core.rmem_max = 125829120

net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_sack = 1
net.core.netdev_max_backlog = 5000

net.ipv4.udp_mem = 102400 873800 125829120
net.ipv4.udp_mem = 102400 873800 125829120

net.ipv4.udp_rmem_min = 10240
net.ipv4.udp_wmem_min = 10240

Context

We are scaling our Kubernetes Cluster inside of an AWS ASG. When adding new nodes, we rely on a working network. Even a not working network would be better than some missing nodes, because the cluster might behave flaky in rare cases and it is not evident where this comes from. We had for example DNS problems. A very small subset of our applications had a high error rate on resolving domain names and we didn't know where this came from for a long time. Now we know, that this was caused by a missing route between the instance where the faulty application ran on and the instance where the DNS server ran on.

Currently we need to manually grep the journal logs and replace broken instances, because it is hard to automatically figuring out, whether a route is missing.

Your Environment

Flannel version: v0.9.0 and v0.10.0
Backend used (e.g. vxlan or udp): vxlan (with and without DirectRouting)
Etcd version: 3.2.11
Kubernetes version (if used): v1.8.5+coreos.0
Operating System and version: Container Linux by CoreOS 1576.5.0 (Ladybug)

Mar 06 '18 14:03 svenwltr

Looks like this might be related to https://github.com/coreos/flannel/issues/779

Mar 10 '18 16:03 jpiper

We already tried the proposed solution, but it didn't work.

IIUC correctly we have to change net.core.rmem_max and net.core.wmem_max. These are the values on a failed node:

# sysctl -a | grep [wr]mem_max
net.core.rmem_max = 125829120
net.core.wmem_max = 125829120

Mar 12 '18 09:03 svenwltr

related to "get/set receive buffer size" in netlink: https://github.com/vishvananda/netlink/commit/ef84ebb87b6680303340b51684666c363c79763e

Sep 16 '21 12:09 chestack

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jan 25 '23 21:01 stale[bot]

This Bug it's still happening. I left this open. There is a workaround to avoid it. We'll update the docs with it until it's fixed.

Jan 26 '23 14:01 rbrtbnfgl

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 25 '23 17:07 stale[bot]

We detected a kind of the same issue. Some routes were missing after a network outage of a few hours. I would expect flannel to reconcile these routes. I'm I right expecting this?

Dec 01 '23 11:12 aslafy-z

Which version of flannel are you using? Maybe your issue is not directly related to this. This issue was related to missing rules when flannel starts with multiple nodes. On you case seems that the rules were somehow removed and aren't recreated again.

Dec 01 '23 11:12 rbrtbnfgl

I'm using v0.17.0 shipped with RKE1. I understand this is a kind of old version and my issue may have been fixed in the meantime. It looks like rancher is still shipping this version with the latest releases of RKE1.

Dec 01 '23 12:12 aslafy-z

flannel flannel copied to clipboard

Missing routes with many nodes on vxlan

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

flannel
flannel copied to clipboard