bridged networking does not inherit mtu (or provide option to do so)
Issue Description
When running in Google cloud, using new c3 instance types, the eth0 mtu is 1460. However, when using rootful networking ( bridged ), podman default is 1500. It would be convenient if the mtu was autodetected OR if it was possible to update it using podman network update.
Steps to reproduce the issue
Steps to reproduce the issue (given fresh installation):
-
sudo ifconfig eth0 mtu 1460(if not running on e.g. Google, this is default there) -
sudo podman run --rm -it alpine ifconfig eth0
The second step will show that the rootful interface mtu is 1500.
Describe the results you received
1500 mtu.
Describe the results you expected
1460 mtu.
podman info output
host:
arch: amd64
buildahVersion: 1.31.2
cgroupControllers:
- cpu
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.7-2.fc38.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.7, commit: '
cpuUtilization:
idlePercent: 99.6
systemPercent: 0.16
userPercent: 0.24
cpus: 22
databaseBackend: boltdb
distribution:
distribution: fedora
variant: cloud
version: "38"
eventLogger: journald
freeLocks: 2048
hostname: mstenber-jumpbox
idMappings:
gidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 524288
size: 65536
uidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 524288
size: 65536
kernel: 6.3.12-200.aiven1.fc38.x86_64
linkmode: dynamic
logDriver: journald
memFree: 66659168256
memTotal: 92774498304
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.7.0-1.fc38.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.7.0
package: netavark-1.7.0-1.fc38.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: crun-1.9-1.fc38.x86_64
path: /usr/bin/crun
version: |-
crun version 1.9
commit: a538ac4ea1ff319bcfe2bf81cb5c6f687e2dc9d3
rundir: /run/user/1002/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20230908.g05627dc-1.fc38.x86_64
version: |
pasta 0^20230908.g05627dc-1.fc38.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
path: /run/user/1002/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.1-1.fc38.x86_64
version: |-
slirp4netns version 1.2.1
commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.3
swapFree: 101351153664
swapTotal: 101351153664
uptime: 2h 49m 36.00s (Approximately 0.08 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /home/mstenber/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/mstenber/.local/share/containers/storage
graphRootAllocated: 1055719055360
graphRootUsed: 21407711232
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /stmp
imageStore:
number: 2
runRoot: /run/user/1002/containers
transientStore: false
volumePath: /home/mstenber/.local/share/containers/storage/volumes
version:
APIVersion: 4.6.2
Built: 1693251588
BuiltTime: Mon Aug 28 19:39:48 2023
GitCommit: ""
GoVersion: go1.20.7
Os: linux
OsArch: linux/amd64
Version: 4.6.2
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Google Cloud c3 family instance (but should really apply everywhere)
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
We do not auto detect the MTU I don't think it make sense to do so, it would be hard to guess what the right MTU is.
You can create networks with the proper mtu with --opt mtu=<NUM>. You can edit the default network config with something like this: https://github.com/containers/podman/discussions/13488#discussioncomment-2358133
podman network update could be an option but I rather not implement that because it would mean to update all running container interfaces with a new MTU which is far more complex.
Unfortunately currently there is no good way to recreate the default network, though, as e.g. podman network rm [-f] does not work with it.
Assuming you don't want to reboot, what is the way to update default network configuration?
We do not auto detect the MTU I don't think it make sense to do so, it would be hard to guess what the right MTU is.
Conservative guess is pretty safe (smallest non-loopback mtu), as too small mtu leads at worst just to performance problems but given how badly PMTU works, too large mtu leads to TCP not working.
The problem is that this has the potential of breaking existing workloads. If one interface has a lower MTU this does not mean the user wants us to us this MTU. I mean yeah it is properly fine it almost all cases but there is still some risk.
And then when would we check what the lowest MTU is? Given we are not a daemon this likely means looking this up for each container setup which adds additional overhead I like to avoid.
Assuming you don't want to reboot, what is the way to update default network configuration?
As written above: You can edit the default network config with something like this: https://github.com/containers/podman/discussions/13488#discussioncomment-2358133
And then when would we check what the lowest MTU is? Given we are not a daemon this likely means looking this up for each container setup which adds additional overhead I like to avoid.
Probably just using the mtu of interface where default route points would be enough.
Following in Makefile does what I'd propose as sane default (possibly doing min(1500, detected mtu) if worried about jumboframes, PMTU and internet):
# default podman network bridge mtu (1500) is too high on Google C3
# instance types' network which is 1460; this can be used to fix that
# ( do it before starting any nodes )
#
# This will also work the other way around - if jumboframes are
# supported, it will increase the MTU (but potentially introduce
# problems if PMTU is broken)
.PHONY: fix-mtu
fix-mtu:
sudo mkdir -p /etc/containers/networks
podman network inspect podman | \
jq '.[] + {options: {mtu: "'`ip -j addr show dev \`ip -j -4 route show default | jq -r 'sort_by(.metric) | .[] | .dev' | head -1\` | jq -r '.[0].mtu'`'"}}' | \
sudo tee /etc/containers/networks/podman.json
Maybe we should add a default_mtu option to containers.conf, this mtu will then be used for all networks unless --opt mtu=... was explicitly set. Given we support drop in config files with .d there that would makes changes like that much easier. I think that would be a good middle ground for the time being.
Also look like for the macvlan and ipvlan drivers the kernel already uses the proper MTU from the connected host device instead of the default 1500. So this is likely a problem only effecting the bridge driver.
Yes, the other two drivers are fine (I started this issue when I noticed the default case not working but my macvlan was fine). Config file could be good compromise (and better than overriding only the default network I am doing in the ^).
sudo mkdir -p /etc/containers/networks podman network inspect podman |
jq '.[] + {options: {mtu: "'ip -j addr show dev \ip -j -4 route show default | jq -r 'sort_by(.metric) | .[] | .dev' | head -1` | jq -r '.[0].mtu'`'"}}' |
sudo tee /etc/containers/networks/podman.json
I was having exactly the same use under kubevirt, the MTU of the VMs interface is 1400, and the default podman network was 1500, so connections didn't work/hang etc sometimes.
I applied the workaround and it helped, what do you think about:
-
Detecting the issue on podman run, and warning with the possible fix(2), may be suggest the the default route mtu, but we know this is not always the right one, it would be 99-ish of cases?
-
Providing an option to change the default network mtu?
I understand there is no golden path here, but MTU issues are hard to diagnose and frustrating to users because those are not evident, in most cases the result would be users running away because "it hangs", "it doesn't work", "it's unstable"..
What do you think?
I think using the default route mtu makes certainly sense in almost all cases. In the meantime there was also reported the opposite use case (wanting a higher default mtu to improve performance https://github.com/containers/podman/issues/23883), if we detect the default gateway mtu it will just work in both directions. Doing this by default would be net benefit for users IMO, throwing warnings is always a bit odd because if we know what is wrong then why don't we just fix it?
For cases where our logic might not choose the right default we still have the per network mtu option to overwrite it. Adding a second default_mtu option may be useful as well but there would need to consider https://github.com/containers/podman/issues/23883#issuecomment-2338491674 so the implement is not a simple as I would like.
I don't have much time to work on this though anytime soon though. If you need/want this prioritized please file a RFE in the Red Hat Jira against podman (linking the upstream issue is fine) so our team can consider this during planning.
@Luap99 what would be the project in Jira? (I see PODMAND but not sure if it's that one) thanks
@mangelajo use the RHEL project and select podman as component
@mangelajo use the RHEL project and select podman as component
done https://issues.redhat.com/browse/RHEL-67298
Thank you!
Popping in here to say that three senior engineers just spent four hours debugging this exact issue, eventually ending up here. The RHEL issue has been closed as "Cannot Reproduce" on the 26th of Feb without comment or elaboration, despite clear reproduction instructions in the report. (Well, they're clear to me anyway - @Luap99 is there more information that could be provided that would help the Red Hat folks reproduce?)
For what it's worth, @fingon's workaround works perfectly, but one has to debug as far as finding this issue page first :-/