bottlerocket Configurable MTU

What I'd like: I'd like to be able to configure the MTU of EC2 instances running bottlerocket.

I'm using cilium on AWS EKS with cluster mesh across multiple AWS VPCs through a Transit Gateway. Unfortunately, Transit Gateways support a max of 8500 MTU and the EKS optimized AMIs use jumbo frames.

Any alternatives you've considered: None

Related discussion: https://github.com/bottlerocket-os/bottlerocket/discussions/3338

Aug 11 '23 17:08 tskinner-oppfi

I'd be happy to take a stab at implementing this. I looked around a little bit and gathered this much:

aws ec2 variant uses wicked systemd service to configure network interfaces
the systemd pre network hook calls netdog generate-net-config
netdog looks for a file or kernel parameters for input to generate the wicked config

So how would adding this feature fit into this setup? Should the api server interact with netdog in some way? or is netdog updated to support mtu config and that is added to kernel parameters or netdog.toml file? Maybe a bit of everything...

Aug 14 '23 17:08 tskinner-oppfi

Thanks for looking in to this @tskinner-oppfi!

Things are complicated a bit by timing here - in that there is active work being done to migrate away from wicked to systemd-networkd. So I think there it might be possible to get something in to support this with wicked, that code is very close to going away.

It looks like Zach and Matt have added a lot of good detail to that issue. If there's anything you are able to add or contribute there to get MTU support, that would be awesome!

Aug 14 '23 17:08 stmcginnis

Now that the networkd work looks to be mostly done (awesome work!) I started looking at this again and here a couple questions:

Would we add another subcommand to netdog to configure just the MTU and expose it as a setting similar to the hostname network setting?
Do we restart the networkd service after the config change or run networkctl reload or something similar? I believe networkctl reload won't bring down the network but just reconfigure with new settings. It gets kind of dicey messing with network settings.

In the mean time I'll experiment a little and try to get some changes working so I can have something concrete to talk about.

Sep 29 '23 21:09 tskinner-oppfi

@tskinner-oppfi Thanks! Contributions are always welcome!

As @stmcginnis mentioned above, we've done the work to integrate systemd-networkd, but currently it is only used in aws-k8s-1.28, the *-dev variants, and aws-ecs-2. The other variants continue to use wicked as their networking backend. Because of this, the process of adding new network config settings is admittedly a bit more work since we need to support config file generation for multiple backends. We want to make sure that the network backend is all but invisible and all of our variants support the same settings.

At a high level, this is how network config is generated:

A systemd service generate-network-config is required by the network-pre target; it runs netdog generate-net-config
netdog reads network config from a file (net.toml), or kernel parameters
This network config is validated and deserialized into a set of Rust structs
Depending on the network backend, netdog converts these structs into different structs that represent the actual config files wicked / systemd-networkd use.

In order to add new settings, there are a few considerations:

The creation of a new version of net config.
Ensure that wicked supports MTU via config file (I have not looked into this yet)
systemd-networkd has a few settings related to MTU; currently Bottlerocket defaults to using the MTU from DHCP. We'd want to make sure we handle this correctly with a custom setting.

From the code perspective, these would be the additions off the top of my head:

Add the new "v4" net config structs and their associated validation and unit tests
Add the wicked struct members and the associated logic to convert net config structs -> wicked structs
Add the systemd-networkd config struct members, as well as the additional builder methods to add values to the config structs. Add the small amount of logic to drive the builders, calling these new methods.
Unit test all the things.

Happy to provide additional direction and answer questions!

Oct 12 '23 22:10 zmrow

Also interested in this feature. It's common for enterprises to use AWS Transit Gateway to connect their enterprise networks to AWS. VPCs. So we really need some consistent way to configure a lower MTU.

Recently we've also seen some cases where some Bottlerocket nodes randomly get MTU 1480 on eth0 and other cases where it seems to get the full 9001. Having an explicit way to configure it is needed

Jan 29 '25 15:01 jcmcken

Hey @jcmcken , thanks for the renewed interest on this old issue! We will discuss within the team on how we want to move forward with this feature and then get back to you.

Jan 29 '25 23:01 koooosh

Getting back: In a normal scenario, we expect this value to be configurable at the network level (DHCP settings) rather than OS, so that is our current recommendation.

However, we can potentially add this setting, so leaving this issue open for contribution.

Jan 30 '25 23:01 koooosh

Getting back: In a normal scenario, we expect this value to be configurable at the network level (DHCP settings) rather than OS, so that is our current recommendation.

However, we can potentially add this setting, so leaving this issue open for contribution.

Is there an example you have for configuring it in the DHCP settings? In AWS, the DHCP settings are configured through DHCP option sets, which have no parameter for MTU. I don't think there's any other location where you can configure DHCP network settings within the environment

Or are you suggesting configuring the systemd-networkd DHCP settings? (Although you said "...rather than OS..." so I don't think this is what you mean). If this is what you mean, do you have a working example / workflow?

Jan 31 '25 14:01 jcmcken

For clarification: In our case, we're using EKS with AWS VPC CNI. Our corporate network setup has a "shared services" VPC with VPC endpoints (VPCEs), one of those endpoints being ECR. This VPC endpoint is accessed over an AWS TGW from connected VPCs. So we need to be able to pull AWS VPC CNI images from ECR VPCE over the TGW. Thus we need the MTU to fall below 8500 (in our case, we prefer 1480 because of some other details), otherwise the connection just hangs.

AWS VPC CNI itself has some logic to set the host's CNI to the same value as the pod interface MTU you configure in the CNI settings. But to do this, we need to be able to pull the image in the first place. There's kind of a chicken-and-egg problem here

Jan 31 '25 14:01 jcmcken

From the ENA driver docs:

The driver supports an arbitrarily large MTU with a maximum that is negotiated with the device. The driver configures MTU using the SetFeature command (ENA_ADMIN_MTU property). The user can change MTU via ip(8) and similar legacy tools.

A workaround in the absence of a dedicated MTU setting could be to use bootstrap containers to configure the MTU; however, I think you will likely hit the same chicken/egg problem of needing your AWS TGW to access the container.

Jan 31 '25 18:01 cbgbt

Not immediately helpful to solving the issue, but another point to note is that our Bottlerocket variants that use wicked are approaching their end-of-life (aws-ecs-1, aws-k8s-1.27). Only needing to support systemd-networkd here would greatly simplify the implementation.

Jan 31 '25 19:01 cbgbt

A workaround in the absence of a dedicated MTU setting could be to use bootstrap containers to configure the MTU; however, I think you will likely hit the same chicken/egg problem of needing your AWS TGW to access the container.

Right, this was brought up in the original discussion (of which this issue is based -- https://github.com/bottlerocket-os/bottlerocket/discussions/3338). The issue is that there's some kind of race condition. So it's not as trivial as just running a bootstrap script setting the device MTU

Feb 03 '25 00:02 jcmcken

we are also facing this issue. as a workaround, I was able to successfully change the MTU on the bottlerocket host using a k8s daemonset that reconfigures systemd-networkd.

obviously this is not an ideal solution, so it would be great to have a proper config option for this exposed in bottlerocket.

in case someone else needs it, the daemonset workaround approach is described here, on the discussion page: https://github.com/bottlerocket-os/bottlerocket/discussions/3338

Sep 10 '25 14:09 MrFishFinger

I would intuitively expect this to be a net.toml setting - would this be an acceptable approach for everyone involved (to just add it to netdog directly and not make it settings configurable)?

Oct 03 '25 07:10 mikn

I would intuitively expect this to be a net.toml setting - would this be an acceptable approach for everyone involved (to just add it to netdog directly and not make it settings configurable)?

It's most correct to add to net.toml but for EC2 we'll also need to expose some way to set it via user-data, otherwise the AMI has to be re-built / re-registered to drop in that file (not fun).

What that might look like is:

Ensuring there's a way to override settings netdog.default-interface on the kernel command line via settings.boot.init-parameters. (Right now, trying to do this will brick the instance.)
Coming up with a way to specify MTU, e.g. netdog.default-interface=eth0:dhcp4,mtu@8500
Parsing the new input into a NetConfigV4 struct instead of NetConfigV1.
Using bootstrap commands to change the parameters and then reboot.

Oct 07 '25 23:10 bcressey

bottlerocket bottlerocket copied to clipboard

Configurable MTU

bottlerocket
bottlerocket copied to clipboard