dracut icon indicating copy to clipboard operation
dracut copied to clipboard

Dracut network-manager, RHEL 8, and Clevis prevents iface setting changes

Open superteece opened this issue 4 years ago • 26 comments

I'm using RHEL 8 in a scenario where I provision a device in a DHCP network then later deploy it into a static IP network. My issue occurs when I use nmtui or nmcli to change the interface from dynamic to static, then reboot, it's dynamic again. This seems to be due to the existence of a file in /run/NetworkManager/system-connections which is a characteristic of Dracut's network-manager module that became default over network-legacy in RHEL 8.3.

I see that this use case was doable in network legacy by passing the omit_dracutmodules+=" ifcfg " parameter. Is there an equivalent for network-manager?

There's a related bug report here: https://bugzilla.redhat.com/show_bug.cgi?id=1818579

Distribution used RHEL 8.3

Dracut version Version: 049-95.git20200804.el8

Init system Which init system is being used?

To Reproduce

  1. Install Clevis, bind to Tang, setup Dracut as the unlocker
  2. Reboot to see that LUKS auto unlocks
  3. Use nmtui or nmcli to configure a static IP
  4. Reboot
  5. Host is reverted to DHCP

Something at boot is creating a profile for Wired Connection. If this is deleted or modified with a static address, a new Wired Connection is generated to replace the deleted or modified one at the next boot. If a new profile is created for the static settings, the auto generated Wired Connection is used as default at next boot.

Expected behavior I should be able to change network settings and have them stick?

Additional context I also posted this issue on the Clevis GitHub as I do not know where a modification should be made or where the solution I'm over looking lies. https://github.com/latchset/clevis/issues/290

superteece avatar Feb 05 '21 16:02 superteece

@superteece what happens if you boot once and rebuild your initramfs, do configurations "stick" then?

johannbg avatar Feb 05 '21 18:02 johannbg

Also what happens if you simply add rd.neednet=1 to the kernel command line?

johannbg avatar Feb 05 '21 19:02 johannbg

rd.neednet=1 is already set by Clevis

As far as rebuilding initramfs after first boot, I'm not in a spot to try this at the moment. For now, reimaging with RHEL 8.2 followed by adding the omit_dracutmodules+=" ifcfg " parameter and creating a systemd service to flush the interface's IP at boot is resolving the issue. But that's due to RHEL 8.2 using network-legacy.

superteece avatar Feb 05 '21 19:02 superteece

@superteece Do I understand correctly that you would like to use DHCP in the initramfs but a static address in the real root?

tyll avatar Apr 29 '21 10:04 tyll

@superteece Do I understand correctly that you would like to use DHCP in the initramfs but a static address in the real root?

Yes this is accurate. The network on which these devices are provisioned is DHCP due to PXE. After provisioning they are transferred to remote sites which use static addresses.

However, if initramfs remains DHCP, Dracut panics when an address fails to be assigned.

superteece avatar Apr 29 '21 11:04 superteece

However, if initramfs remains DHCP, Dracut panics when an address fails to be assigned.

I dont' follow. If you provide kernel arguments for DHCP, then initramfs will use DHCP also in 8.2, regardless of any changes to the profile on disk, wouldn't it?

tyll avatar Apr 29 '21 14:04 tyll

The panic happens when they are transferred to remote sites which use static addresses as in the dhcp kernel parameter is not removed in the same process as it should be thus dracut picks it up and halts when there is no response to the dhcp request.

johannbg avatar Apr 29 '21 14:04 johannbg

The panic happens when they are transferred to remote sites which use static addresses as in the dhcp kernel parameter is not removed in the same process as it should be thus dracut picks it up and halts when there is no response to the dhcp request.

Can you maybe rephrase this? What do you mean with "remote sites" here?

@superteece Did you try this with the network-legacy module and omit_dracutmodules+=" ifcfg " or do you just assume that this worked for your use case?

tyll avatar Apr 30 '21 09:04 tyll

It means as he mentioned it The network on which these devices are provisioned is DHCP due to PXE. After provisioning they are transferred to remote sites which use static addresses. ( which I'm probably misunderstanding ).

What release of dracut is currently in RHEL 8? The bug report mentions 50 on Fedora ( and probably the underlying cause for it was this change ) https://github.com/dracutdevs/dracut/commit/5965710e018989b02a56e8d190b71740ca3b5463#diff-fcc4de97e69e134f38b5bc2cd5222466881d89f9ad189c075dbb20b34d88ee6e ) but there are changes in 51 which might address this https://github.com/dracutdevs/dracut/commit/faea4e4ddb10f697590b80f8f17181341c537262#diff-fcc4de97e69e134f38b5bc2cd5222466881d89f9ad189c075dbb20b34d88ee6e https://github.com/dracutdevs/dracut/commit/3dcaa97ca4dcfa8092252a22df62c60941e59ce3#diff-fcc4de97e69e134f38b5bc2cd5222466881d89f9ad189c075dbb20b34d88ee6e

So this really needs to be tested on more recent dracut release ( 51+ ) to see if this is still an issue.

johannbg avatar Apr 30 '21 11:04 johannbg

It means as he mentioned it The network on which these devices are provisioned is DHCP due to PXE. After provisioning they are transferred to remote sites which use static addresses. ( which I'm probably misunderstanding ).

Thank you, I understand now. From what I understand, as long as there are dracut/kernel ip command line options, they determine whether the system uses DHCP or static IP. So if the switch from DHCP->static only happens with nmcli but without changing the kernel cmdline, then the system will not boot (if there is no DHCP). If the kernel cmdline is adjusted to static IP, NM should not do any DHCP afterwards. But if the kernel cmdline is static, then there is no need to change from an initrd network config to a different one in the real root.

tyll avatar Apr 30 '21 13:04 tyll

Yeah that or the initramfs is not being rebuilt to incorporate the changes on real root and then there are bugs which might be something like the tool rebuilds the initramfs and removes the kernel command line from the boot loader config ( as it should ) but nm in initrd ignores that/those file(s) which might be the case here hence this needs to be tested with dracut 51+.

johannbg avatar Apr 30 '21 14:04 johannbg

@superteece is this still an issue?

johannbg avatar Dec 11 '21 23:12 johannbg

It seems we've encountered this as well.

  • Wouldn't nmcli device disconnect $DEVICE; dracut -f workaround preserving the state of the device? Is there any way of not activating the device on boot?
  • Alternatively, if some connection is to be used, but with different config, the following: nmcli connection modify $CONNECTION connection.autoconnect-priority 10; and then dracut -f?

pvalena avatar Jul 14 '22 20:07 pvalena

It seems we've encountered this as well.

  • Wouldn't nmcli device disconnect $DEVICE; dracut -f workaround preserving the state of the device? Is there any way of not activating the device on boot?

No info about connections is probably copied to the initramfs, so dracut -f seems to be redundant. But it would be nice to have the state preserved across boots.

  • Alternatively, if some connection is to be used, but with different config, the following: nmcli connection modify $CONNECTION connection.autoconnect-priority 10; and then dracut -f?

pvalena avatar Jul 14 '22 21:07 pvalena

What I've found is that NetworkManager saves the devices which should not be used/autocreated in /var/lib/NetworkManager/no-auto-default.state .... dracut should probably respect that.


EDIT: on my system it seems to be /var/lib/NetworkManager/NetworkManager.state

With sample entry from manpage:

no-auto-default=00:22:68:5c:5d:c4,00:1e:65:ff:aa:ee
no-auto-default=eth0,eth1
no-auto-default=*

pvalena avatar Jul 14 '22 21:07 pvalena

  • Meaning the file should probably copied and read by /usr/libexec/nm-initrd-generator

pvalena avatar Jul 14 '22 21:07 pvalena

CC @bengal

LaszloGombos avatar Feb 02 '23 16:02 LaszloGombos

Hi, when NM runs in initrd, it saves its state in /run/NetworkManager. After switch root, the new NM instance reads the content of that directory (which includes initrd-generated connection profiles and a information on what profile was active on what device), so that the state is propagated from initrd to real root.

The reason to do this propagation is that there are cases (e.g. boot from a network share) requiring that the network is not changed on switch root. At the moment there isn't any automatic mechanism in NM to avoid that propagation, and the only workaround is to either:

  • create a new configuration in real root specifying keep-configuration=no and allowed-connections=except:origin:nm-initrd-generator for the device. See NM commit https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/bace14fe1f374db26e49e4e7d61d2fbfce4241cc .
  • add a new service in real root that runs before NM, clears the state directory and flushes interfaces.

Since those solutions are inconvenient, probably there should be a kernel command line argument that makes the NM dracut module drop all the state before switch root (and possibly, set keep-configuration=no). What do you think?

bengal avatar Feb 07 '23 10:02 bengal

CC @thom311

bengal avatar Feb 07 '23 10:02 bengal

Since those solutions are inconvenient, probably there should be a kernel command line argument that makes the NM dracut module drop all the state before switch root (and possibly, set keep-configuration=no). What do you think?

I am not against a keep-configuration=no boot option, but I wonder how useful it is.

For keep-configuration=no to make sense, you probably anyway need to configure the real-root. It's not that you could boot an machine the first time and it's gonna work (is it?). If you already configure the real-root, then keep-configuration=no + allowed-connections=except:origin:nm-initrd-generator seems to be the solution (not merely a workaround). Doesn't seem inconvenient to me...

thom311 avatar Feb 07 '23 12:02 thom311

Hi, when NM runs in initrd, it saves its state in /run/NetworkManager. After switch root, the new NM instance reads the content of that directory (which includes initrd-generated connection profiles and a information on what profile was active on what device), so that the state is propagated from initrd to real root.

IAFAIU, this is is caused about configuration NOT propagating the other way. F.e. when 'generating' the connections in the intrd, interfaces which were supposed to be disabled are re-enabled, going against the configuration saved in the root. Is there a right way to propagate such configuration into initrd?

pvalena avatar Feb 08 '23 09:02 pvalena

IAFAIU, this is is caused about configuration NOT propagating the other way. F.e. when 'generating' the connections in the initrd, interfaces which were supposed to be disabled are re-enabled, going against the configuration saved in the root. Is there a right way to propagate such configuration into initrd?

Normally, no interface is supposed to be configured in the initrd. Only if there is a kernel command line specifying some network configuration, then nm-initrd-generator creates the corresponding connection profiles. Therefore, I don't understand what you mean by "propagating" the state from real root to initrd: in initrd the only state comes from the kernel command line.

Actually, that's not completely true, users should be able to include connection profiles in the initrd image and they will be honored. What I meant is that propagating the disabled state from real root doesn't make much sense because in initrd interfaces are disabled by default.

Ok, reading the thread again, probably you are referring to the scenario where the command line contains only rd.neednet. In absence of other networking options, the generator considers rd.neednet in the same way as ip=dhcp and generates a wildcard connection profile that does DHCP on all available interfaces.

If activating all devices bothers you and you are willing to rebuild the initrd image, you can decide which interfaces to activate by adding a built-in kernel command line that specifies exactly what to configure (for example, ip=enp1s0:dhcp or ip=[2001:db8::2]:::56::enp2s0). There is no automatic mechanism to include inside the initrd information about which interfaces should be disabled.

bengal avatar Feb 08 '23 14:02 bengal

For keep-configuration=no to make sense, you probably anyway need to configure the real-root. It's not that you could boot an machine the first time and it's gonna work (is it?). If you already configure the real-root, then keep-configuration=no + allowed-connections=except:origin:nm-initrd-generator seems to be the solution (not merely a workaround). Doesn't seem inconvenient to me...

Right, you probably need some configuration in the real root.

On the other hand, having to modify the persistent configuration on the real root to say "override what was configured from initrd" seems backwards to me. The persistent configuration should stay as it is, and the kernel command line that adds the initrd configuration should say "this doesn't propagate to real root".

That said, I don't have specific use cases in mind where one or the other way are clearly best.

bengal avatar Feb 08 '23 14:02 bengal