Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

FlatCar Beta 3913.1.0 with systemd 255 enables DHCP rapid commit by default

Open daMupfel opened this issue 10 months ago • 7 comments

Description

The new Beta FlatCar with version 3913.1.0 updated systemd to version 255. With this new version comes support for DHCP RapidCommit which seems to be enabled by default:

RapidCommit=

    Takes a boolean. The DHCPv4 client can obtain configuration parameters from a DHCPv4 server through a rapid two-message exchange (discover and ack). When the rapid commit option is set by both the DHCPv4 client and the DHCPv4 server, the two-message exchange is used. Otherwise, the four-message exchange (discover, offer, request, and ack) is used. The two-message exchange provides faster client configuration. See [RFC 4039](https://tools.ietf.org/html/rfc4039) for details. Defaults to true when Anonymize=no and neither AllowList= nor DenyList= is specified, and false otherwise.

    Added in version 255.

Our cloud provider (CloudSigma) seems to have a faulty implementation of DHCPv4 rapid commit which means that we are no longer getting an IP address.

This can be fixed (for existing servers) by copying the default config from /usr/lib/systemd/network/zz-default.network as an own config and adapting the DHCPv4 section as follows:

[DHCPv4]
RoutesToDNS=false
RapidCommit=false

Impact

Not getting an IP address. Because the CloudInit process for CloudSigma requires an assigned lease this also means that the whole setup doesn't work anymore.

Environment and steps to reproduce

  1. Upload current beta FlatCar CloudSigma vendor image to CloudSigma
  2. Create a new machine
  3. No public IP is assigned and the CloudInit process never runs

Expected behavior

Server correctly setup with IP and CloudInit config.

Additional information

We are also in discussions with CloudSigma in order to fix their DHCP implementation. Not sure when and how this will go though.

This is not really a bug on Flatcars side but rather a break for us because the network config is now different with the new version.

The question is how this could be fixed (if you are open to do it on the FlatCar side). I currently see the following options:

  • Update the default network config to disabled rapid commit
  • Add a custom network config file to the vendored CloudSigma image

I would like to get some feedback for this and probably can provide a PR if you would be fine with one of the proposed solutions :).

daMupfel avatar Apr 25 '24 11:04 daMupfel

Add a custom network config file to the vendored CloudSigma image

this would definitely be a good idea if the default does not cause widespread problems for other platforms

jepio avatar Apr 25 '24 13:04 jepio

@jepio if added only to oem-cloudsigma it shouldn't affect other platforms, should it? And it potentially affects all CloudSigma deployments the way I read the summary.

@daMupfel I would argue that implementing this should be done as an OEM sysext so the change is also distributed to existing nodes when these update (@pothos please keep me honest). Using an OEM sysext would also allow to change the config with future updates if required. As sysexts cover /usr, the config should go to /usr/lib/systemd/network/. This is slightly (but only slightly) more complicated than just dropping a config file to the oem-cloudsigma provider. The biggest challenge is to introduce OEM sysext to the cloudsigma image as this image is currently not using OEM sysexts afaict. But that shouldn't keep you from working on a PR, OEM sysexts are used for most other images. The concept should be easily portable to cloudsigma.

t-lo avatar Apr 25 '24 15:04 t-lo

I think the OEM sysext might get loaded too late? For most clouds the small network config files are part of the base image because they need to be in bootengine and in init.

pothos avatar Apr 25 '24 15:04 pothos

Hmmm, good point, re-reading the summary it states that bootstrap configuration fails, so this is required in the initrd. No sysext then.

t-lo avatar Apr 25 '24 15:04 t-lo

Hi, thanks for the feedback so far :).

When adding it to the oem image it won't be updated on existing installations (the oem partition seems to keep the state of the original install), is that correct? At least that was my observation so far. If so, are there any options to make this work for existing installations which update?

daMupfel avatar May 02 '24 08:05 daMupfel

I added a PR regarding this issue in flatcar/scripts. This probably won't fix existing installations (during update) but we can manually fix those in our system quite easily. Please let me now if you think this is a good solution.

daMupfel avatar Jun 11 '24 06:06 daMupfel

Hi,

I created the PR more than 2 months ago and after a first review I haven't received any feedback yet. I don't want to rush anyone, but I'd appreciate an idea of the expected timeline for this review. This information is important for my company to decide whether to invest in a custom build job on our CI system or wait for it to be integrated upstream. Currently, we are building the images manually.

Thank you very much for your work.

Best regards, David

daMupfel avatar Aug 28 '24 05:08 daMupfel