dracut icon indicating copy to clipboard operation
dracut copied to clipboard

dbus in the initrd just like this is a bad idea

Open poettering opened this issue 2 years ago • 7 comments

So I recently noticed that dracut nowadays embeds dbus in the initrd. That's a bad idea and leads to deadlocks like this one: https://bugzilla.redhat.com/show_bug.cgi?id=1976653#c4

This then led people to submit this PR: https://github.com/bus1/dbus-broker/pull/271

Which is wrong too.

D-Bus simply isn't ready to run in the initrd, because it doesn't allow to schedule at which point in time services shall become activatable. activatable services are activatable during the entire runtime of the bus and if something talks via the bus to a service like that too early it will hang if the activation request cannot be fulfilled yet because the system is not booted up far enough for this to work. The result of that are deadlocks.

So, I am sorry, but the dbus support in dracut should never have been merged like this.

This can be solved properly, but requires some engineering in dbus. See: https://github.com/bus1/dbus-broker/issues/259

But that hasn't materialized yet (though dbus-broker upstreams agree this makes sense). But until this running dbus in the initrd, is just prone to issues like the mentioned.

dbus-broker PR 271 is bogus if you ask me, because it rearranges an upstream unit file to match a downstream change in a unit file that dracut made. Specifically, dracut's dbus support patches out some deps in dbus.socket. But if so it should really patch out the same deps in the broker unit file too, and not expect upsrream dbus-rboker to adjust for that...

poettering avatar Jun 07 '23 08:06 poettering

D-Bus simply isn't ready to run in the initrd, because it doesn't allow to schedule at which point in time services shall become activatable. activatable services are activatable during the entire runtime of the bus and if something talks via the bus to a service like that too early it will hang if the activation request cannot be fulfilled yet because the system is not booted up far enough for this to work. The result of that are deadlocks.

dracut has support for D-Bus since Nov 2020 (#900), I wonder how 2.5 years later this problem shows up. @teg @dvdhrm could you share your opinion on this?

aafeijoo-suse avatar Jun 09 '23 13:06 aafeijoo-suse

It showed up in July 2021, see linked bug reports. people then tried to "fix" this with a change in systemd that i think is simply the wrong place.

I don't really grok what caused this to show up. as long as only wiced pulled in dbus the issue should have remain specific to suse i guess. but apparently some other modules started to pull in the dbus mess now which made this more visible.

poettering avatar Jun 09 '23 14:06 poettering

systemd-networkd, network-manager (and bluetooth) pulls in dbus now - all released with dracut v54 on May 14, 2021.

The referenced bug in the redhat bug database is from the systemd-networkd dracut module .

Possibly related existing dracut bugs - https://github.com/dracutdevs/dracut/issues/1521 https://bugzilla.redhat.com/show_bug.cgi?id=1964879

CC @bengal and @lnykryn for visibility.

My suggestion at this point is to disable pulling in bluetooth dracut module by default (even if there is a bluetooth keyboard in use).

LaszloGombos avatar Jun 09 '23 14:06 LaszloGombos

dracut has support for D-Bus since Nov 2020 (#900), I wonder how 2.5 years later this problem shows up. @teg @dvdhrm could you share your opinion on this?

I think Lennart did a splendid job in explaining the race-condition regarding activation of dbus services. The underlying cause is a lack of APIs to add dbus-activatable-services at runtime. Hence, systemd has to treat all dbus-services activatable once dbus is running, which can easily conflict with configurations specified in the respective service units.

dvdhrm avatar Jun 12 '23 09:06 dvdhrm

Does systemd-networkd actually have a hard dependency on dbus (I know network-manager and bluetooth do) ?

LaszloGombos avatar Jul 21 '23 04:07 LaszloGombos

it does not.

poettering avatar Aug 07 '23 15:08 poettering

Fedora has opted to remove the network-legacy module from dracut in Fedora 40 effectively forcing NetworkManager into the initrd if you need early boot networking. This seems like an actual problem given what's detailed here and no resolution to the issue described? I don't think anyone wants an initrd that deadlocks/hangs.

jmpolom avatar Mar 26 '24 03:03 jmpolom