systemd
systemd copied to clipboard
systemd unexpectedly(?) terminates units with `Unexpected error response from GetNameOwner(): Connection terminated` during a "mount storm"
We have an internal customer that started experiencing unexpected restarts of certain units on their OpenShift clusters. After analyzing logs the symptoms looked very similar to this issue, since the affected units reported Unexpected error response from GetNameOwner(): Connection terminated
just before getting restarted. However, we already have patches for that in RHEL.
Another candidate was https://github.com/systemd/systemd/issues/30573, but I failed to reproduce that issue on multiple machines with several reproducers I could come up with. However, after putting these reproducers together with a couple of other hints from customer's logs, I was able to conjure a slightly different (and spiced up) reproducer, that appears to reproduce the issue the customer is facing not only on RHELs, but also on Fedora Rawhide, and the latest systemd upstream (af46138f39acb55f1faefcb48ebc3316efac87b4 ATTOW, also on Rawhide). In all cases dbus-broker
was used as the dbus... broker.
Reproducer:
#/bin/bash
set -eux
set -o pipefail
NMOUNTS=800
MPOINT="$(findmnt --mountpoint / --noheading -o SOURCE)"
TS="$(date +"%s.%N")"
systemctl mask --now --runtime systemd-oomd.{socket,service}
id testuser || useradd testuser
systemctl stop "tmp-mnt*.mount" || :
systemctl log-level debug
for i in {0..100}; do
echo "=== $i ==="
for i in $(seq 0 $NMOUNTS); do
mkdir -p "/tmp/mnt$i"
systemd-mount "$MPOINT" "/tmp/mnt$i" &
done
wait
for i in $(seq 0 $NMOUNTS); do
systemd-umount "/tmp/mnt$i" &
done
wait
busctl status --user --machine=testuser@ --no-pager
systemctl stop [email protected]
(! journalctl --since="@$TS" --grep "Connection terminated")
done
With this reproducer I get following output in a couple of iterations (really depends on the machine you run this on and your luck):
# systemctl --version
systemd 256~devel (256~devel-gaf46138)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT -LIBARCHIVE
# ./repro.sh
...
++ systemctl stop [email protected]
++ journalctl [email protected] --grep 'Connection terminated'
-- No entries --
++ for i in {0..100}
++ echo '=== 1 ==='
=== 1 ===
+++ seq 0 800
...
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
++ busctl status --user --machine=testuser@ --no-pager
BusAddress=unixexec:path=systemd-run,argv1=-M.host,argv2=-PGq,argv3=--wait,argv4=-pUser%3dtestuser,argv5=-pPAMName%3dlogin,argv6=systemd-stdio-bridge,argv7=-punix:path%3d%24%7bXDG_RUNTIME_DIR%7d/bus
BusID=db15434bb5619733c1ee88f5c4c006b1
PID=8351
PIDFD=yes
EUID=0
EGID=0
SupplementaryGIDs=0
Label=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
++ systemctl stop [email protected]
++ journalctl [email protected] --grep 'Connection terminated'
Apr 20 11:25:30 xxx systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminat>
Apr 20 11:25:30 xxx systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 20 11:25:30 xxx systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
...
More details from the journal:
...
[ 747.889938] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=8 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 747.889987] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=9 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 747.890050] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=470 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 747.890101] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=471 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 747.890152] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=314 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 747.890199] systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1/unit/tmp_2dmnt199_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=315 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
[ 764.600613] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.687538] systemd[1]: systemd-homed.service: Deactivated successfully.
[ 764.687805] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-homed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 764.736119] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.736517] audit: BPF prog-id=49 op=UNLOAD
[ 764.737643] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.738216] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.738638] NetworkManager[768]: <info> [1713626730.6956] caught SIGTERM, shutting down normally.
[ 764.739340] NetworkManager[768]: <info> [1713626730.6963] dhcp4 (eth0): canceled DHCP transaction
[ 764.739428] NetworkManager[768]: <info> [1713626730.6963] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
[ 764.739488] NetworkManager[768]: <info> [1713626730.6963] dhcp4 (eth0): state changed no lease
[ 764.739568] NetworkManager[768]: <info> [1713626730.6965] manager: NetworkManager state is now CONNECTED_SITE
[ 764.741702] NetworkManager[768]: <info> [1713626730.6988] exiting (success)
[ 764.785825] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 764.785993] systemd[1]: NetworkManager.service: Deactivated successfully.
[ 764.834007] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.834605] avahi-daemon[744]: Got SIGTERM, quitting.
[ 764.835133] avahi-daemon[744]: Leaving mDNS multicast group on interface eth0.IPv6 with address 2620:52:0:bb:f816:3eff:fe67:3519.
[ 764.835201] avahi-daemon[744]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.187.174.
[ 764.835251] avahi-daemon[744]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
[ 764.835321] avahi-daemon[744]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
[ 764.835399] avahi-daemon[744]: avahi-daemon 0.8 exiting.
[ 764.881474] systemd[1]: avahi-daemon.service: Deactivated successfully.
[ 764.881747] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=avahi-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 764.968329] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.969842] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.970411] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 764.970969] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
...
The most problematic part is the unexpected NetworkManager restart (as that's how the customer noticed the issue in the first place). However, the system itself seems to be fine after this incident:
# systemctl status
● xxx
State: running
Units: 397 loaded (incl. loaded aliases)
Jobs: 0 queued
Failed: 0 units
Since: Sat 2024-04-20 11:12:47 EDT; 18min ago
systemd: 256~devel-gaf46138
CGroup: /
...
And now for the big questions - is this our issue or somebody else's (dbus-broker?)? And can we do anything about it?
/cc @msekletar @rmetrich
In all cases dbus-broker was used as the dbus... broker.
Just out of curiosity, I replaced dbus-broker
with dbus-daemon
on the Rawhide test machine and the results are the same:
...
[ 974.472229] systemd[1]: Mounted tmp-mnt404.mount.
[ 974.500563] systemd[1]: Mounted tmp-mnt405.mount.
[ 974.523083] dbus-daemon[744]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 974.527767] systemd[1]: Mounted tmp-mnt406.mount.
[ 974.555652] systemd[1]: Mounted tmp-mnt407.mount.
...
[ 979.242541] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.242914] NetworkManager[766]: <info> [1713629543.1759] caught SIGTERM, shutting down normally.
[ 979.243545] NetworkManager[766]: <info> [1713629543.1765] dhcp4 (eth0): canceled DHCP transaction
[ 979.243663] NetworkManager[766]: <info> [1713629543.1765] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
[ 979.243764] NetworkManager[766]: <info> [1713629543.1766] dhcp4 (eth0): state changed no lease
[ 979.243855] NetworkManager[766]: <info> [1713629543.1767] manager: NetworkManager state is now CONNECTED_SITE
[ 979.243957] dbus-daemon[744]: [system] Activating systemd to hand-off: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid>
[ 979.245558] NetworkManager[766]: <info> [1713629543.1787] exiting (success)
[ 979.255542] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.268549] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.275493] systemd[1]: NetworkManager.service: Deactivated successfully.
[ 979.275801] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? add>
[ 979.286649] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.300932] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.301142] avahi-daemon[741]: Got SIGTERM, quitting.
[ 979.301603] avahi-daemon[741]: Leaving mDNS multicast group on interface eth0.IPv6 with address 2620:52:0:bb:f816:3eff:fe67:3519.
[ 979.301666] avahi-daemon[741]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.187.174.
[ 979.301725] avahi-daemon[741]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
[ 979.301775] avahi-daemon[741]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
[ 979.305061] avahi-daemon[741]: avahi-daemon 0.8 exiting.
[ 979.308451] systemd[1]: avahi-daemon.service: Deactivated successfully.
[ 979.308705] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=avahi-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=>
[ 979.321993] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.337345] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.352637] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.368100] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 979.379664] systemd[1]: systemd-homed.service: Deactivated successfully.
[ 979.379912] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-homed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr>
[ 979.399167] audit: BPF prog-id=50 op=UNLOAD
[ 979.973431] systemd-journald[577]: [🡕] Suppressed 696657 messages from init.scope
Here's one interesting data point: I let the reproducer run on two machines again overnight (and for the most of Sunday, so 16+ hours on each machne) after the first "crash", and there were no more crashes during that period. But if I reboot either of those machines and start the reproducer again, it crashes in just a couple of iterations again (and then continues without any further crashes).
So it looks like the "crash" happens once (or twice in a quick succession, to be more precise), and from there it seems fine no matter how hard I hammer the machine, until the next reboot.
That is, this is the full "double crash" that happens initially (latest main on Rawhide):
[ 3207.675737] systemd[1]: Mounted tmp-mnt98.mount.
[ 3207.678658] systemd[1]: Mounted tmp-mnt99.mount.
[ 3235.286101] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.287100] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.287478] avahi-daemon[736]: Got SIGTERM, quitting.
[ 3235.287958] avahi-daemon[736]: Leaving mDNS multicast group on interface eth0.IPv6 with address 2620:52:0:bb:f816:3eff:fe40:9445.
[ 3235.288031] avahi-daemon[736]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.0.185.167.
[ 3235.288085] avahi-daemon[736]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
[ 3235.288143] avahi-daemon[736]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
[ 3235.292155] avahi-daemon[736]: avahi-daemon 0.8 exiting.
[ 3235.340082] systemd[1]: avahi-daemon.service: Deactivated successfully.
[ 3235.340576] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=avahi-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3235.438624] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.439286] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.550745] systemd[1]: systemd-homed.service: Deactivated successfully.
[ 3235.551125] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-homed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3235.615132] audit: BPF prog-id=65 op=UNLOAD
[ 3235.615330] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.615933] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.616641] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.617218] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.617991] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3235.618489] NetworkManager[760]: <info> [1713649856.4647] caught SIGTERM, shutting down normally.
[ 3235.619803] NetworkManager[760]: <info> [1713649856.4662] dhcp4 (eth0): canceled DHCP transaction
[ 3235.619989] NetworkManager[760]: <info> [1713649856.4662] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
[ 3235.620112] NetworkManager[760]: <info> [1713649856.4663] dhcp4 (eth0): state changed no lease
[ 3235.620417] NetworkManager[760]: <info> [1713649856.4668] manager: NetworkManager state is now CONNECTED_SITE
[ 3235.624064] NetworkManager[760]: <info> [1713649856.4705] exiting (success)
[ 3235.673354] systemd[1]: NetworkManager.service: Deactivated successfully.
[ 3235.673875] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
...
[ 3363.909334] systemd[1]: Unmounted tmp-mnt99.mount.
[ 3364.013839] systemd[1]: [email protected]: Deactivated successfully.
[ 3364.014048] systemd[1]: Stopped [email protected].
[ 3364.036622] audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user-runtime-dir@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3369.808828] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809489] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809588] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809672] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809770] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809856] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.809937] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.810031] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.810123] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 3369.812073] systemd[1]: Removed slice user-1001.slice.
[ 3369.830819] systemd[1]: Started run-u65.service.
And from now on there doesn't seem to be any similar "crashes" until the machine is rebooted.
Looks to me as if the broker kicks us off the bus. Maybe we hit some message limit, because of the many PropertiesChanged signals we generate or so?
if you turn of debug logging in the broker, does it show anything interesting?
This might be a slightly difficult (which is an issue in itself), but in the meantime the dbus-daemon logs show this just ~5 seconds before systems starts throwing errors:
[ 974.472229] systemd[1]: Mounted tmp-mnt404.mount.
[ 974.500563] systemd[1]: Mounted tmp-mnt405.mount.
[ 974.523083] dbus-daemon[744]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 974.527767] systemd[1]: Mounted tmp-mnt406.mount.
[ 974.555652] systemd[1]: Mounted tmp-mnt407.mount.
Yeah, there's a bunch of timeouts before the kicking happens:
# journalctl -b -1 --no-pager --no-hostname -o short-monotonic _SYSTEMD_UNIT=dbus-daemon.service + PRIORITY=3
[ 4.395420] dbus-daemon[737]: [system] SELinux support is enabled
[ 6.449341] dbus-daemon[737]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.6' (uid=0 pid=759 comm="/usr/sbin/NetworkManager --no-daemon" label="system_u:system_r:NetworkManager_t:s0")
[ 6.585767] dbus-daemon[737]: [system] Successfully activated service 'org.freedesktop.hostname1'
[ 6.617726] dbus-daemon[737]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=759 comm="/usr/sbin/NetworkManager --no-daemon" label="system_u:system_r:NetworkManager_t:s0")
[ 6.673848] dbus-daemon[737]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
[ 1462.282314] (sd-pam)[4611]: pam_selinux(login:session): Setting file context "unconfined_u:object_r:user_devpts_t:s0" failed for /dev/pts/0: Operation not permitted
[ 1462.284100] dbus-daemon[737]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.21" (uid=1001 pid=4611 comm="(sd-pam)" label="system_u:system_r:init_t:s0") interface="org.freedesktop.login1.Manager" member="ReleaseSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=741 comm="/usr/lib/systemd/systemd-logind" label="system_u:system_r:systemd_logind_t:s0")
[ 1462.284830] (sd-pam)[4611]: pam_systemd(login:session): Failed to release session: Access denied
[ 1579.480912] dbus-daemon[737]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.22" (uid=1001 pid=4602 comm="(sd-pam)" label="system_u:system_r:init_t:s0") interface="org.freedesktop.login1.Manager" member="ReleaseSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=741 comm="/usr/lib/systemd/systemd-logind" label="system_u:system_r:systemd_logind_t:s0")
[ 1579.481734] (sd-pam)[4602]: pam_systemd(systemd-user:session): Failed to release session: Access denied
[ 1583.573093] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1588.615026] dbus-daemon[737]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30002ms)
[ 1608.576630] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1633.586669] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1658.599699] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1665.337672] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.353411] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.370864] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.374225] dbus-daemon[737]: [system] Activating systemd to hand-off: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=759 comm="/usr/sbin/NetworkManager --no-daemon" label="system_u:system_r:NetworkManager_t:s0")
[ 1665.389511] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.413527] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.439672] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.457760] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.479134] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1665.505351] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1683.618633] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1690.377194] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out (service_start_timeout=25000ms)
[ 1717.011329] dbus-daemon[737]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30009ms)
[ 1818.500894] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501461] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501554] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501654] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501731] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501811] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501874] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501937] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.501995] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1818.515350] dbus-daemon[737]: [system] Successfully activated service 'org.freedesktop.systemd1'
[ 1818.622638] dbus-daemon[737]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.27' (uid=0 pid=8208 comm="(systemd)" label="system_u:system_r:init_t:s0")
[ 1818.686025] dbus-daemon[737]: [system] Successfully activated service 'org.freedesktop.home1'
[ 1819.006207] (sd-pam)[8221]: pam_selinux(login:session): Setting file context "unconfined_u:object_r:user_devpts_t:s0" failed for /dev/pts/0: Operation not permitted
[ 1819.007295] dbus-daemon[737]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.31" (uid=1001 pid=8221 comm="(sd-pam)" label="system_u:system_r:init_t:s0") interface="org.freedesktop.login1.Manager" member="ReleaseSession" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=741 comm="/usr/lib/systemd/systemd-logind" label="system_u:system_r:systemd_logind_t:s0")
[ 1819.007498] (sd-pam)[8221]: pam_systemd(login:session): Failed to release session: Access denied
Does #32398 help this issue?
@poettering this is just a theory but could it be that we cause the timeout to ourselves. What I mean is that GetOwnerChange is async method call so we queue or maybe even send the request (depending on bus write queue size) and then because mount storm causes a ton of signals (UnitNew, JobNew, PropertiesChanged) we are essentially busy looping on signal generation and we don't process method_reply from GetOwnerChnaged in time.
It seems everything else gets kicked off the bus too, including avahi and so on. Before we start wild guessing, let's have a look at what dbus broker actually does here, i.e. enable logs on that and see why it might decide to let eveywhere hang.
I don't think it kicks of anyone else than us, see the dbus-daemon log above:
[ 1583.573093] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
[ 1588.615026] dbus-daemon[737]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30002ms)
[ 1608.576630] dbus-daemon[737]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
All the units that are affected are of type dbus
, so it looks like systemd is responsible for the killing once we get kicked off the bus?
It seems everything else gets kicked off the bus too, including avahi and so on. Before we start wild guessing, let's have a look at what dbus broker actually does here, i.e. enable logs on that and see why it might decide to let eveywhere hang.
I actually did exactly that. Recompiled dbus broker with extra log statements and I can confirm that no service is actually kicked from the bus.
Instead what it seems to happen is NameOwnerChange signal subscription failure. Creating subscription involves GetNameOwner change method call and we timeout on that async method call. Because of that we decide to transition bus to closing state and we nuke all Type=dbus services together with the bus.
hmm, so you are saying the issue is entirely limited to pid1 and its communication with the bus?
hmm, so in "process_running()" in sd-dbus.c we'll keep processing the wqueue preferably, and only process the rqueue if the wqueue is empty (or the socket is full).
This indeed means that if we overload the wqueue with messages that it will drown out the rqueue.
the question is if that's a bug, or if the bug is that we create so many outbound messages?
we probably could change process_running() so that on every 2nd call it dispatches the rq first, and the wq 2nd, so that both qeuues keep running.
i.e. something like:
r = process_timeout(bus);
if (r != 0)
goto null_message;
/* Toggle which queue we process preferably in this iteration, the rq or the wq */
bus->queue_toggle = !bus->queue_toggle;
if (bus->queue_toggle) { /* wqueue preferably */
r = dispatch_wqueue(bus);
if (r != 0)
goto null_message;
}
r = dispatch_track(bus);
if (r != 0)
goto null_message;
r = dispatch_rqueue(bus, &m);
if (r < 0)
return r;
if (r == 0 && !bus->queue_toggle) {
/* Nothing happened on the rq, and we didn't check the wq in this iteration, hence do that now. */
r = dispatch_wqueue(bus);
goto null_message;
}
if (!m)
goto null_message;
r = process_message(bus, m);
if (r != 0)
goto null_message;
if (ret) {
…
}
if (m->header->type == SD_BUS_MESSAGE_METHOD_CALL) {
…
}
return 1;
…
(where queue_toggle would be a new boolean field in the sd_bus structure)
Anyone wants to play around with this, if this helps?
Anyone wants to play around with this, if this helps?
I will give it a shot.
@mrc0mmand so here is patch https://github.com/msekletar/systemd/commit/9b5670832c4486d98d809dfb03809bbeba7f59a3 It is mostly verbatim copy of proposed patch from @poettering. Feel free to give it a spin.
I am already testing it locally but it would help if you could run it overnight on a bunch of machines.
@mrc0mmand so here is patch msekletar@9b56708 It is mostly verbatim copy of proposed patch from @poettering. Feel free to give it a spin.
I gave it a spin and managed to hit the issue again a couple of iterations. But maybe I f-ed something up, so I'll try again tomorrow (as I'm waiting for another test machine anyway).
I am running the reproducer (on Rawhide) for couple hours already and it is still going on while previously it used to crash after 5 minutes or so.
I am running the reproducer (on Rawhide) for couple hours already and it is still going on while previously it used to crash after 5 minutes or so.
Did you reboot the machine before running the reproducer? As I said in https://github.com/systemd/systemd/issues/32381#issuecomment-2068135138 it seems to happen only once per reboot:
# git log -1 --oneline
78fd385e02 (HEAD -> pr) sd-bus: alternate between dispatch order of internal queues
# systemctl --version
systemd 256~devel (256~devel-g78fd385)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT -LIBARCHIVE
# ./repro.sh
...
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Failed to stop automount unit: Connection timed out
Apr 25 05:36:52 xxx.redhat.com systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:36:52 xxx.redhat.com systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 25 05:39:04 xxx.redhat.com systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
I added a log message that shows the state of the queues if either of them has any jobs, and it looks like the issue is might not be an overloaded rqueue:
# journalctl --no-hostname -o short-monotonic -b -p warning _PID=1
...
[ 33.804923] systemd[1]: systemd-oomd.service: Got notification message from PID 704, but reception is disabled
[ 33.805050] systemd[1]: systemd-oomd.service: Got notification message from PID 704, but reception is disabled
[ 384.848408] systemd[1]: process_running: wqueue size: 0; rqueue size: 2
[ 384.848936] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 384.901772] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 384.905494] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 384.909864] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 611.876364] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.015579] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.016197] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.016805] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.017438] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.019127] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.019701] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.020286] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 612.124789] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.242169] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.243199] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.243450] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.243607] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.243792] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.243958] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.244189] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.244409] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.244583] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 753.252802] systemd[1]: process_running: wqueue size: 22; rqueue size: 0
[ 753.256604] systemd[1]: process_running: wqueue size: 0; rqueue size: 5
[ 753.256974] systemd[1]: process_running: wqueue size: 0; rqueue size: 4
[ 753.257206] systemd[1]: process_running: wqueue size: 0; rqueue size: 3
[ 753.257399] systemd[1]: process_running: wqueue size: 0; rqueue size: 2
[ 753.257570] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 753.261854] systemd[1]: process_running: wqueue size: 0; rqueue size: 2
[ 753.262183] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 753.310483] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 753.317432] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 753.321147] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
[ 753.390846] systemd[1]: process_running: wqueue size: 0; rqueue size: 1
@mrc0mmand at which point you print the queue size? Right after entering process_running?
Hmm, so it seems my initial gut feeling was correct (we flood bus with writes and don't get to reading stuff in time) but actually the root cause is different than what @poettering hinted at.
Every systemd-mount
establishes a private bus connection with systemd. In manager_dispatch_dbus_queue()
we send out dbus signals (unit changed, job changed) and we also try to do "budget accounting" to limit amount of messages we send out in single loop iteration. However, we don't propagate budget variable to bus_unit_sent_changed_signal()
and in there we send out unit change events unconditionally for all private buses (there are 100s of them) but decrement budget only by 1 in manager_dispatch_dbus_queue()
. We essentially have O(U * B)
time complexity here, where U
is number of mount units which changed and B
is number of private buses. Actually, U
is capped by the budget but because there are so many private buses that doesn't matter and writes still "starve" reads.
I think that idea of budget and threshold implemented in manager_dispatch_dbus_queue()
doesn't take into consideration private buses. In every iteration they don't have anything in write queue so every write is actually send out immediately making write queue empty hence next write is also send out immediately.
Hi @mrc0mmand,
I've updated my branch where I am trying to fix the issue, https://github.com/msekletar/systemd/commits/issue-32381/. I'd appreciate if you could test it. Thanks!
Hi @mrc0mmand,
I've updated my branch where I am trying to fix the issue, https://github.com/msekletar/systemd/commits/issue-32381/. I'd appreciate if you could test it. Thanks!
Unfortunately, I can still reproduce the issue, although now with more errors:
# git log -2 --oneline
acd77dae9d (HEAD -> issue-32381, msekletar/issue-32381) bus: actually decrease the budget for every generated message
9b5670832c sd-bus: alternate between dispatch order of internal queues
# systemctl --version
systemd 256~devel (256~devel-gacd77da)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT -LIBARCHIVE
# journalctl -b --no-pager -o short-monotonic --no-hostname --grep "Connection terminated"
[ 1158.688234] systemd[1]: Failed to process message type=error sender=org.freedesktop.DBus destination=n/a path=n/a interface=n/a member=n/a cookie=4294967295 reply_cookie=1 signature=s error-name=org.freedesktop.DBus.Error.NoReply error-message=Connection terminated: Connection timed out
[ 1158.688277] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.NetworkManager', failing connection: Connection terminated
[ 1158.688322] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1158.823677] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.home1', failing connection: Connection terminated
[ 1158.823940] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.063048] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.hostname1', failing connection: Connection terminated
[ 1159.063700] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.064230] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.resolve1', failing connection: Connection terminated
[ 1159.064738] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.066915] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.network1', failing connection: Connection terminated
[ 1159.067416] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.067916] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.timesync1', failing connection: Connection terminated
[ 1159.068421] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.068920] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.UDisks2', failing connection: Connection terminated
[ 1159.069420] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.069922] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.freedesktop.login1', failing connection: Connection terminated
[ 1159.070420] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1159.070923] systemd[1]: Unable to add match type='signal',sender='org.freedesktop.DBus',path='/org/freedesktop/DBus',interface='org.freedesktop.systemd1.Activator',member='ActivationRequest', failing connection: Connection terminated
[ 1159.071434] systemd[1]: Unable to request name, failing connection: Connection terminated
[ 1159.071935] systemd[1]: Unable to add match type='method_call',interface='org.freedesktop.MemoryAllocation1',path='/org/freedesktop/MemoryAllocation1',destination='org.freedesktop.systemd1',member='GetMallocInfo', failing connection: Connection terminated
[ 1395.400492] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.400776] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.400920] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.401060] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.401161] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.401274] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.401374] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 1395.401476] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Hi @mrc0mmand, so here is yet another attempt to fix (avoid) the issue. Feel free to give it a spin.
https://github.com/msekletar/systemd/commit/8d0c8b908e75f8841821f8ede27a60b881010a12
@msekletar so we actually have this dbus method Subscribe() that clients are supposed to call before we send out any signals. It currently has an effect only for the "API" bus connection, not on the direct connections.
I guess we could change that, and strictly require this for direct connections too. This wouldn't be an API break, since direct connections are private anyway.
implementation idea: maybe maintain two sets of private busses. The ones that called "Subscribe" and the ones that didn't. make sure that new direct connections are added to the latter, and move them over once they call Subscribe.
That way, we could trivially suppress all those messages on direct connections.
Hi @mrc0mmand, so here is yet another attempt to fix (avoid) the issue. Feel free to give it a spin.
This looked promising and after running the reproducer for an hour I couldn't reproduce the issue anymore. So I decided to step up the game a bit and use 1600 mounts instead of just 800, and with that the issue seems to be, unfortunately, back:
# journalctl --since="-15min" -b -o short-monotonic --no-hostname -p info
...
4125.682261] systemd[1]: Unmounted tmp-mnt998.mount.
[ 4125.708647] systemd[1]: tmp-mnt999.mount: Deactivated successfully.
[ 4125.708981] systemd[1]: Unmounted tmp-mnt999.mount.
[ 4125.736105] systemd[1]: [email protected]: Deactivated successfully.
[ 4125.736427] systemd[1]: Stopped [email protected].
[ 4145.719502] systemd-journald[587]: [🡕] Suppressed 661931 messages from init.scope
[ 4168.229210] systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229268] systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229317] systemd[1]: systemd-homed.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229364] systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229431] systemd[1]: systemd-networkd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229483] systemd[1]: systemd-timesyncd.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229535] systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229591] systemd[1]: systemd-resolved.service: Unexpected error response from GetNameOwner(): Connection terminated
[ 4168.229638] systemd[1]: systemd-hostnamed.service: Unexpected error response from GetNameOwner(): Connection terminated