dracut icon indicating copy to clipboard operation
dracut copied to clipboard

nm-run.sh should check interfaces repeatedly until they are up

Open scottcwang opened this issue 2 years ago • 19 comments

Currently, nm-wait-online-initrd.service only waits for NetworkManager to activate all devices; it doesn't wait for NetworkManager to reach a connected state. Because the settled hook nm-run.sh only provides one opportunity for the connection to be up, when DHCP loses to nm-run.sh, the online hooks are not run and the boot fails.

Here's an example of the journal illustrating the issue. I've patched warn "$(nmcli)" ; warn "$(cat /run/NetworkManager/system-connections/default_connection.nmconnection))" into nm-run.sh to show that DHCP loses out to nm-run.sh:

  1. nm-wait-online-initrd.service executes nm-online -s (line 5967).
  2. nm-online -s waits for NetworkManager client startup, skipping the CONNECTED_* state check.
  3. The device activates, so NetworkManager reports startup complete (line 15681) and nm-online -s returns; but DHCP isn't done yet (in fact it hasn't actually started).
  4. dracut-initqueue begins and runs the nm-run.sh settled hook (line 17173).
  5. nm-run.sh sees that the device has no UUID yet, so skips executing the online hooks for the device (line 17278). (NB setting may-fail=false, line 17219, doesn't affect -s.)
  6. nm-run.sh writes /tmp/nm.done (line 17287).
  7. DHCP finally finishes (line 20510, interspersed NetworkManager logs).
  8. On every subsequent run, nm-run.sh sees /tmp/nm.done and exits without running the online hooks for the now online device (line 20628).

Changes

There are several possible solutions:

  • (This PR) Remove the -s option from the nm-online invocation in nm-wait-online-initrd.service, so that nm-online waits for CONNECTED_*, including DHCP being done. Here's an example of the journal with this PR in place, showing the nm-run.sh waiting for DHCP and successfully running the online hooks.
  • Make nm-run.sh not write /tmp/nm.done unless it has confirmed NetworkManager's state is CONNECTED_*
  • Make nm-run.sh keep track of which devices it hasn't yet run the online hooks for, and retry them on the next run

Checklist

  • [x] I have tested it locally
  • [ ] I have reviewed and updated any documentation if relevant
  • [ ] I am providing new code and test(s) for it

Fixes #

scottcwang avatar Jul 01 '22 18:07 scottcwang

JFYI, previous discussion here: #1682

aafeijoo-suse avatar Jul 05 '22 10:07 aafeijoo-suse

Thanks, @aafeijoo-suse -- I hadn't noticed the duplicate PR. My apologies.

My experience bears out the first sentence, but not the second, of what @bengal has written:

That's not correct. -s waits that NM has attempted to activate every connection. A connection attempt ends either when the connection is successful (i.e. it gets an address if it needs to) or when it fails (for DHCP, after the timeout specified by rd.net.timeout.dhcp=).

The debug logs in the gists I linked above show that NM "activates" the eno1 connection once its device is active, so reports "startup complete", whereupon nm-online -s exits. However, the connection attempt clearly hasn't started DHCP. It is without -s that will make nm-online wait for the CONNECTED_* state, namely that DHCP has finished, before exiting.

I don't have multiple connections to play with, but my quick read of nm-online's source code seems to suggest that -s, viz. "startup complete", tests whether every autoconnect connection has been activated (or at least attempted); and the absence of -s tests whether any connection has become connected, putting NM into a CONNECTED_* state. I don't know how waiting for the CONNECTED_* state impinges on other dracut modules, but as far as I can tell, nm-run.sh definitely needs it; otherwise it will skip all the online hooks and the boot will fail.

scottcwang avatar Jul 06 '22 04:07 scottcwang

The problem in the log above is that eno1 doesn't have carrier. NM has a built-in carrier-timeout of 6 seconds, after which it gives up waiting for the device. Since the default value is probably too short (the network-legacy module had 10 seconds timeout), I have a patch in the works to increase the timeout to 10 or 15 seconds: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1239

In the log I see:

Jul 01 16:17:53 fedora NetworkManager[904]: <debug> [1656692273.7863] device[479c44c90ebf5ecc] (eno1): add_pending_action (3): 'carrier-wait'
...
Jul 01 16:17:59 fedora NetworkManager[904]: <debug> [1656692279.7927] device[479c44c90ebf5ecc] (eno1): remove_pending_action (0): 'carrier-wait'
Jul 01 16:17:59 fedora NetworkManager[904]: <trace> [1656692279.7929] settings: startup-complete: ready, no more profiles to wait for
...
Jul 01 16:18:00 fedora kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx

So, eno1 gets carrier after 7 seconds.

Until the value is bumped in NM, you can supply kernel argument rd.net.timeout.carrier=<seconds> to increase it to an arbitrary value.

bengal avatar Jul 07 '22 07:07 bengal

I don't have multiple connections to play with, but my quick read of nm-online's source code seems to suggest that -s, viz. "startup complete", tests whether every autoconnect connection has been activated (or at least attempted); and the absence of -s tests whether any connection has become connected

Correct. In the initrd case we want to wait for every generated connection to finish, hence the -s. The difference is that the startup-complete event used by -s has a timeout on carrier, while waiting for the connected state doesn't.

bengal avatar Jul 07 '22 07:07 bengal

Awesome, rd.net.timeout.carrier solves my problem just as well. Thank you @bengal.

However, from reading the man page for dracut.modules initqueue/online, I'm still confused why, without rd.net.timeout.carrier, nm-run.sh didn't eventually execute the online hooks for eno1:

This hook (initqueue/online) gets executed whenever a network interface comes online (that is, once it is up and configured by the configured network module).

Is that supposed to happen with NM? The logs I shared above show that nm-run.sh only checks each connection once for triggering the online hooks, then never checks again even when eno1 did come online; so the online hooks will appear to not obey the semantics described in this man page description. Is nm-run.sh intended to behave this way?

(This discrepancy in the behaviour of nm-run.sh is obscured if nm-online -s does obtain all the connections as you described, but if the carrier-wait elapses, this behaviour will surface as seen in the debug logs.)

scottcwang avatar Jul 07 '22 22:07 scottcwang

nm-run.sh is executed when NM reaches the startup-complete phase, and is executed only once. It checks which interfaces have an active connection (by looking at the connection-uuid in /run/NetworkManager/devices/$ifindex), and invokes the online and netroot hooks on them.

If the interface doesn't have carrier before the timeout, startup-complete is reached early when no connection is active on the interface. Therefore, hooks are not called.

bengal avatar Jul 12 '22 16:07 bengal

Is nm-run.sh intended to behave this way?

Ideally, the carrier timeout should be long enough, and https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1239 will help with that.

If it's not long enough, then network-online is reached too early and potentially the hooks won't be called. Probably the implementation of the module could be improved to avoid this, but I think the best solution to this is to have a timeout that is long enough.

bengal avatar Jul 12 '22 16:07 bengal

Yep, I follow your reasoning. And your PR for the lengthened timeout in NM would work for me. But since you've said:

If it's not long enough, then network-online is reached too early and potentially the hooks won't be called. Probably the implementation of the module could be improved to avoid this,

My later suggestion is exactly this improvement you've described. If you're willing to allow nm-run.sh to run through multiple times, checking for newly active connections each time, then the hooks will always eventually be called regardless of the NM timeout, which (1) completely eliminates the race condition in nm-online -s and obviates the question of how long the timeout should be and (2) complies with the description in the initqueue/online docs.

Do you know of a reason this suggestion would be a bad idea? If not, I can fiddle around with it and come up with a POC.

scottcwang avatar Jul 14 '22 00:07 scottcwang

Do you know of a reason this suggestion would be a bad idea? If not, I can fiddle around with it and come up with a POC.

I think it's not actually a bad idea; I'm mostly concerned about the complexity that it will introduce, but maybe it can be done easily.

Currently nm-run.sh is executed in the initqueue, and nm-wait-online-initrd.service is ordered before dracut-initqueue.service so that the initqueue runs after NetworkManager reaches startup-complete (i.e. when also nm-online -s returns).

nm-run.sh writes /tmp/nm.done to ensure it's run only once. If you remove the check then the script will potentially run multiple times, until it's removed from the initqueue and as long as there are other hooks in the hooks/finished directory keeping the initqueue looping. I wonder if we need to ensure that /sbin/netroot and initqueue/online hooks are run only once for each interface, or it's okay to run them multiple times. In the latter case, it would be simpler.

bengal avatar Jul 14 '22 07:07 bengal

Hi @bengal , I've taken a shot at updating nm-run.sh so that it keeps track of which NM-managed interfaces it hasn't yet run the online hooks for. This way, each interface only has the online hook run once. Only once the online hooks are run for all the interfaces does /tmp/nm.done get written. This way, there's no dependence on interfaces coming up before the carrier timeout elapses.

Here's a log showing how it works on my machine.

Let me know of any corner cases I haven't accounted for. I'm not sure if there's a better way than nmcli to figure out if an interface is not NM-managed.

scottcwang avatar Jul 23 '22 22:07 scottcwang

Hi, thanks for the patch. The problem I see is the following: when there is an interface that doesn't have any matching connection profile generated (but the interface is still managed), ct will be set to 1, preventing the write of /tmp/nm.done. This will also prevent the initqueue to finish, due to this line in nm_generate_connections():

echo '[ -f /tmp/nm.done ]' > "$hookdir"/initqueue/finished/nm.sh

Like, suppose you have ip=enp1s0:dhcp rd.neednet=1 in the command line, and there are enp1s0, enp2s0 interfaces; will this work? Or the boot will be stuck forever?

bengal avatar Jul 25 '22 08:07 bengal

Hmm, good point.

Would enp2s0's connection be inhibited by initrd-no-auto-default.conf?

If not, we'll have to add some marker to distinguish those connection keyfiles in /run/NetworkManager/system-connections that were generated by nm-initrd-generator. Or maybe, now that this PR is going to create a /tmp/nm.done.d to keep track of the done interfaces, we can make initqueue/finished/nm.sh check each individual interface's file in /tmp/nm.done.d, on top of /tmp/nm.done. Which would you prefer?

scottcwang avatar Jul 26 '22 04:07 scottcwang

Would enp2s0's connection be inhibited by initrd-no-auto-default.conf?

initrd-no-auto-default.conf prevents the automatic creation of a "Wired Connection 1" profile when no other matching profile exists for the interface. It doesnt' help in this scenario.

If not, we'll have to add some marker to distinguish those connection keyfiles in /run/NetworkManager/system-connections that were generated by nm-initrd-generator. Or maybe, now that this PR is going to create a /tmp/nm.done.d to keep track of the done interfaces, we can make initqueue/finished/nm.sh check each individual interface's file in /tmp/nm.done.d, on top of /tmp/nm.done. Which would you prefer?

The problem is this: we need to check which devices are still waiting for a potential activation; or, in other words, which devices have a matching connection profile but are not yet activated (for example because carrier is missing).

At the same time, we must not consider devices without a matching profile (as enp2s0 in my example above). For those, it would be wrong to wait.

But deciding if the device is in the first group or the second is not trivial. The generator could create a file containing a list of devices for which it generated a profile. The file would contain:

  • the interface name, for arguments like ip=enp1s0:dhcp

  • the MAC address when the interface is specified via a MAC: ip=00-11-22-33-44-55:dhcp. The generator can't look up the matching interface name because at the time it runs, interfaces are not necessarily discovered yet.

  • in case of wildcard profiles like ip=dhcp, it would store something like: *.

Then, nm-run.sh would need some more logic to parse the file and decide if all devices specified there are activated. This seems quite fragile and requires an additional layer of code, so I would avoid this.

A possible alternative would be that instead of calling hooks from nm-run.sh (only once when NM signals network-online), hooks are called by NM via dispatcher scripts for the UP event of interfaces. But I'm not sure if that would work, because it means that the hooks are called from a different context, and maybe environment variable are missing.

bengal avatar Jul 26 '22 08:07 bengal

The problem is this: we need to check which devices are still waiting for a potential activation; or, in other words, which devices have a matching connection profile but are not yet activated (for example because carrier is missing). At the same time, we must not consider devices without a matching profile (as enp2s0 in my example above). For those, it would be wrong to wait.

I notice that nm_generate_connections, called by the nm-config.sh cmdline hook, does rm -f /run/NetworkManager/system-connections/*, then runs nm-initrd-generator to create the connection keyfiles (which contains the UUID of the connection). Since nm-initrd.service is After=dracut-cmdline.service, would it be correct to say that at the point when nm-initrd-generator completes, /run/NetworkManager/system-connections should be for sure a clean set of the devices in the first group you mentioned?

If so, what do you think of:

  • making nm-run.sh write /tmp/nm-done.d/<uuid of the completed connection>
  • making the loop towards the end of nm_generate_connections iterate over all of the connection keyfiles in /run/NetworkManager/system-connections/*, etc., and having finished/nm.sh watch for all of the /tmp/nm-done.d/<uuid of the expected connection>?

scottcwang avatar Jul 28 '22 05:07 scottcwang

Since nm-initrd.service is After=dracut-cmdline.service, would it be correct to say that at the point when nm-initrd-generator completes, /run/NetworkManager/system-connections should be for sure a clean set of the devices in the first group you mentioned?

Not exactly. /run/NetworkManager/system-connections contains a list of connections. Each connection can be activated on 0,1, or more devices. When the generator runs, it doesn't know which devices are available, and it generates connections only based on the kernel command line (because at the time the generator runs there is no guarantee that all devices were discovered by kernel).

bengal avatar Jul 29 '22 16:07 bengal

I see, that makes sense. Thanks for explaining. Is the below summary of what you've said about the desired semantics of nm-run.sh accurate?

  • Once a connection becomes active, the online hooks should be run on all its devices, if any
  • nm-run.sh should wait until all connections generated by nm_generate_connections are active, and should not wait for other connections (although the online hooks can be run for those connections' devices)

If so, how about yet another approach:

  • nm_generate_connections saves off the UUIDs of the connections generated by nm-initrd-generator. This freezes the list of connections, and should take care of distinguishing which connections we should and shouldn't wait for
  • Then nm-run.sh later comes along and iterates through all the NM connections, grabbing those connections' corresponding device list from nmcli --get-values GENERAL.DEVICES connection show <connection UUID>. If a connection has no, or more than one, device, we can handle it by looping through its device list. If it meets a connection that isn't on the frozen list of connections, then it shouldn't inhibit writing nm-done.sh

scottcwang avatar Jul 30 '22 04:07 scottcwang

@bengal where are we at with this one?

johannbg avatar Aug 15 '22 16:08 johannbg

If it helps, I updated this PR with an implementation of the approach I outlined in my previous comment.

I haven't got around to testing the multiple-connection scenario or the scenarios where there are multiple, or no, devices to one connection. Do you have any suggestions for a kernel argument string that would create the latter scenario?

scottcwang avatar Aug 16 '22 04:08 scottcwang

Then nm-run.sh later comes along and iterates through all the NM connections, grabbing those connections' corresponding device list from nmcli --get-values GENERAL.DEVICES connection show <connection UUID>. If a connection has no, or more than one, device, we can handle it by looping through its device list. If it meets a connection that isn't on the frozen list of connections, then it shouldn't inhibit writing nm-done.sh

I don't think this would work because GENERAL.DEVICES returns the device only the connection started activating on a device. If the device is waiting for carrier, then the connection is not activating and the command returns an empty string.

The problem we need to solve is: we want to call the dracut hooks also for devices that become ready (e.g. appear or get carrier) after NM reached the 'online' target.

On a second thought, it seems to me that the problem can't be solved by delaying the boot until ALL devices are ready because:

  • in case a device never gets carrier, the boot will be delayed forever (or until other longer dracut timeouts kick in), while we want to respect the rd.net.timeout.carrier= parameter

  • the script can't predict which devices will appear later

--

I think it would be better to solve the problem in a different way: instead of having nm-run.sh call source_hook initqueue/online and /sbin/netroot for all known devices at the same time, we could use the dispatcher script feature of NM to call a script every time an interface goes up in NM. This script would call those two dracut hooks for the interface that was just activated.

In this way, even if a interface goes up after NM reached the online target, the hooks will be called.

However, I'm not sure those hooks can be called directly from an external process; probably they must be called from within dracut initqueue (this is a question for @johannbg). In such case, we need to use a indirect way like: the dispatcher script doesn't call directly the hooks, but instead drops a script inside one of the initqueue directories, that get processed from the initqueue.

bengal avatar Aug 31 '22 07:08 bengal

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar Sep 30 '22 17:09 stale[bot]

Wait, stale-bot, don't close yet! I'm back! Sorry for the delay.

On a second thought, it seems to me that the problem can't be solved by delaying the boot until ALL devices are ready because: in case a device never gets carrier, the boot will be delayed forever (or until other longer dracut timeouts kick in), while we want to respect the rd.net.timeout.carrier= parameter

@bengal, I'm afraid I'm not following what you mean here .... I get that we want to delay boot not for all devices, but only those from nm_generate_connections; this is what you pointed out previously in this thread). I do think we're on the same page about this -- my proposed logic has it such that only those connections from nm_generate_connections (what I called "frozen connections") inhibit boot progress, and any other connections don't inhibit boot progress. (Modulo the bit about GENERAL.DEVICES, which I will need to study up on.)

So now I don't think I understand fully how rd.net.timeout.carrier plays a role in nm-run.sh. I thought this parameter only regulates the duration of nm-online -s, which basically dictates how long it takes before nm-run.sh is called for the first time to run hooks on devices that are already online. What am I missing?

How about a concrete example, so you can easily point out where my logic goes off track:

  • The kernel arguments are ip=enp1s0:dhcp ip=enp2s0:dhcp ip=enp3s0:dhcp
  • The machine also has enp4s0 and enp5s0, but they aren't in the kernel arguments, so nm_generate_connections doesn't generate a connection for them
  • nm-online -s waits for enp1s0, enp2s0, and enp3s0. enp1s0 is fast and comes up before rd.net.timeout.carrier elapses; but enp2s0 and enp3s0 are not yet done by then, so nm-online -s exits
  • nm-run.sh starts. It finds enp1s0 is associated with an active NM connection, and runs the dracut hooks on this device
  • Some time later, enp2s0's connection activates too, and nm-run.sh runs the dracut hooks on this device
  • Then enp4s0's connection activates too, and nm-run.sh runs the dracut hooks on this device too (even though it wasn't in the kernel arguments; this is to comply with the man page for the online hook)
  • nm-run.sh keeps waiting for enp3s0 until the dracut initqueue timeout elapses:
    • If enp3s0's connection activates before the dracut initqueue timeout elapses, then nm-run.sh runs the dracut hooks on enp3s0, and then reports done, and then the boot continues
    • Otherwise, the boot fails (because we didn't get all the devices we specified in the kernel arguments)
  • nm-run.sh doesn't wait for enp5s0 since it wasn't in the kernel arguments

scottcwang avatar Oct 06 '22 02:10 scottcwang

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar Nov 09 '22 05:11 stale[bot]