[BUG] MTU larger than 1500 bytes does not work over Linux bridges
We're wasting tons of time arguing about MTU settings and ignoring the elephant in the room: it won't work unless we adjust the underlying virtualization infrastructure
FWIW, it would be "great fun" troubleshooting BGP sessions or large OSPF networks :((
Lab topology
You can use any two devices as long as one of them is a host (forcing the Linux bridge to be used).
nodes:
r:
device: iosv
h:
device: linux
links:
- r:
h:
mtu: 1600
To Reproduce
Start the lab and try to do ping -s 1500 r from h:(
The "oversized" packets get dropped at the ingress interface of the Linux bridge. Once that MTU is adjusted, ping works.
It looks like we have to set the ":libvirt__mtu" setting in Vagrantfile. I have no idea what happens with vrnetlab-based containers.
It looks like we have to set the ":libvirt__mtu" setting in Vagrantfile. I have no idea what happens with vrnetlab-based containers.
See https://github.com/srl-labs/containerlab/blob/main/docs/manual/network.md#link-mtu
MTU defaults to 9500, bridge "should" inherit the minimum MTU
Update(s):
- Libvirt LAN links definitely have a problem
- UDP tunnels (libvirt P2P links) seem to be OK
- Pure containers are OK as the change in MTU changes the MTU of the underlying Linux interface
- vrnetlab containers are OK -- containerlab changes the MTU to 9500, and the corresponding QEMU tap interface has the MTU 65000
The truly bizarre part: it seems like the Linux bridge is doing IPv4 fragmentation while bridging packets. I can't decide whether to be amazed or disgusted.
The truly bizarre part: it seems like the Linux bridge is doing IPv4 fragmentation while bridging packets. I can't decide whether to be amazed or disgusted.
https://www.spinics.net/lists/netdev/msg596072.html
When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge will do defragment at PREROUTING and re-fragment at POSTROUTING. At the re-fragment bridge will check if the max frag size is larger than the bridge's MTU in br_nf_ip_fragment(), if it is true packets will be dropped. And this patch use the outdev's MTU instead of the bridge's MTU to do the br_nf_ip_fragment.
Could it be br_netfilter doing the (de)fragmentation, in support of firewall filters? See https://github.com/torvalds/linux/blob/master/net/bridge/br_netfilter_hooks.c#L807