tmt Add systemd soft reboot functionality

Fixes: #4298

I guess we should support systemctl soft-reboot for all possible plugins?

Because it could help our users to update all of userspace (when there's no kernel changes) efficiently, which means shorter reboot time. And, we should not make it default. "It definitely can't work to change all reboots to soft reboots as some use cases will want to change the kernel for example."

Pull Request Checklist

[ ] implement the feature
[ ] write the documentation
[ ] extend the test coverage
[ ] update the specification
[ ] adjust plugin docstring
[ ] modify the json schema
[ ] mention the version
[ ] include a release note

Nov 10 '25 15:11 skycastlelily

Some more info on why systemd soft reboot is needed is useful either in the comments or the PR description. Can't judge right now, if we should make that the default whenever systemd is present, or a feature specific to systemd or something else entirely

Nov 11 '25 14:11 LecrisUT

Hi @cgwalters, I'm working on this mr, it works with virtual plugin, but not bootc plugin, and I manually ssh into the bootc system created by bootc plugin, then run "systemctl soft-reboot", then I'm not able to login to that system anymore, do you have idea why, or what additional steps should I add, or any hints,guides?Thanks:)

Nov 17 '25 02:11 skycastlelily

One preliminary comment: "soft reboot" already has a history and its own meaning, which is by no means the same as "systemd soft-reboot". Please, consider using different name/label/variables (e.g. soft_reboot="$3") that would make it clear that this mode is not the "soft reboot" as already known and implemented, i.e. software-induced reboot, e.g. via shutdown -r now or similar command. Paired with the "hard reboot", on the level of poweroff/poweron events. Great care must be taken to make it clear that "soft reboot" is one thing, and "systemd soft-reboot" is something else.

Nov 20 '25 15:11 happz

Yes true, but OTOH I think it should be quite unusual and rare for code executed inside a guest to "reach out" to a hypervisor or control plane and do a physical reboot. That's the case that I think needs dedicated nomenclature.

I would probably rename the hard to physical in tmt.

But yes arguably too systemd's soft reboots should probably have been called an init reboot to be less ambiguous.

Nov 20 '25 15:11 cgwalters

Hi @cgwalters, I'm working on this mr, it works with virtual plugin, but not bootc plugin, and I manually ssh into the bootc system created by bootc plugin, then run "systemctl soft-reboot", then I'm not able to login to that system anymore, do you have idea why, or what additional steps should I add, or any hints,guides?Thanks:)

Offhand it works for me (I happened to be test in a bcvk libvirt run quay.io/almalinuxorg/almalinux-bootc:10.0 machine) but if you can show your tmt reproducer set up we could look.

My recommendation here is to be sure you have a console set up at least for debugging.

Nov 20 '25 15:11 cgwalters

Yes true, but OTOH I think it should be quite unusual and rare for code executed inside a guest to "reach out" to a hypervisor or control plane and do a physical reboot.

Not necessarily the code running inside a guest, but for tmt this is a real situation. A guest may freeze, crash, e.g. thanks to various kernel torturing tests, tests causing kernel oops on purpose, and tmt does have tools to invoke "hard" reboot to restore the law and order. tmt does not care that much about how it's implemented, whether it's a magic of Beaker or libvirt & qemu, but it's called "hard reboot" in tmt codebase.

I would probably rename the hard to physical in tmt.

That would be possible.

But yes arguably too systemd's soft reboots should probably have been called an init reboot to be less ambiguous.

I think sticking to "systemd soft-reboot" term should be enough, it just needs to be consistent. My point was to double check changes to avoid variables like soft_reboot which are not about "soft reboot", but "systemd soft-reboot" instead.

Nov 20 '25 15:11 happz

but if you can show your tmt reproducer set up we could look.

sure, here is the reproducer:

tmt run --skip finish --skip cleanup plan --name plans/bootc$ content of the plan :

summary: Basic smoke test
provision:
    how: bootc
    container-image: quay.io/fedora/fedora-bootc:43
execute:
  script: echo 'test'

Offhand it works for me (I happened to be test in a bcvk libvirt run quay.io/almalinuxorg/almalinux-bootc:10.0 machine) but if you can show your tmt reproducer set up we could look.

with quay.io/almalinuxorg/almalinux-bootc:10.0,it indeed work, though there is a failed service after systemctl soft-reboot:

 (dev) [lnie@ tmt]$ ssh -oForwardX11=no -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -oConnectionAttempts=5 -oConnectTimeout=60 -oServerAliveInterval=5 -oServerAliveCountMax=60 -oIdentitiesOnly=yes -p10029 -i /var/tmp/tmt/run-024/plans/bootc/provision/default-0/id_ecdsa -oPasswordAuthentication=no -S/var/tmp/tmt/run-024/ssh-sockets/127.0.0.1-10029-root.socket [email protected] 
Warning: Permanently added '[127.0.0.1]:10029' (ED25519) to the list of known hosts.
[root@default-0 ~]# systemctl soft-reboot
[root@default-0 ~]# Connection to 127.0.0.1 closed by remote host.
Connection to 127.0.0.1 closed.
(dev) [lnie@ tmt]$ ssh -oForwardX11=no -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -oConnectionAttempts=5 -oConnectTimeout=60 -oServerAliveInterval=5 -oServerAliveCountMax=60 -oIdentitiesOnly=yes -p10029 -i /var/tmp/tmt/run-024/plans/bootc/provision/default-0/id_ecdsa -oPasswordAuthentication=no -S/var/tmp/tmt/run-024/ssh-sockets/127.0.0.1-10029-root.socket [email protected] 
Warning: Permanently added '[127.0.0.1]:10029' (ED25519) to the list of known hosts.
Last login: Mon Nov 24 10:41:36 2025 from 10.0.2.2
[systemd]
Failed Units: 1
rpcbind.service
[root@default-0 ~]# systemctl status rpcbind.service
× rpcbind.service - RPC Bind
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; preset: enabled)
   Active: failed (Result: signal) since Mon 2025-11-24 10:41:56 UTC; 46s ago
 Duration: 2min 38.510s
Invocation: b7db4da5acc04012a32f3314ef7177f4
TriggeredBy: ● rpcbind.socket
     Docs: man:rpcbind(8)
  Process: 852 ExecStart=/usr/bin/rpcbind $RPCBIND_ARGS -w -f (code=killed, signal=KILL)
 Main PID: 852 (code=killed, signal=KILL)
 Mem peak: 2.4M
      CPU: 31ms

Nov 24 10:39:18 localhost systemd[1]: Starting rpcbind.service - RPC Bind...
Nov 24 10:39:18 localhost systemd[1]: Started rpcbind.service - RPC Bind.
[root@default-0 ~]#

My recommendation here is to be sure you have a console set up at least for debugging.

here is part of the console output:

[  OK  ] Reached target local-fs-pre.target…Preparation for Local File Systems.         Starting systemd-udevd.service - R…ager for Device Events and Files...
[  OK  ] Started systemd-udevd.service - Ru…anager for Device Events and Files.
         Mounting boot.mount - /boot...
         Mounting var.mount - /var...
[  130.009798] XFS (vda3): Mounting V5 Filesystem 9a6f9939-ec4a-4c1f-a023-c0e51698b41c
[FAILED] Failed to mount var.mount - /var.
See 'systemctl status var.mount' for details.
[DEPEND] Dependency failed for cloud-init-m…rvice - Cloud-init: Single Process.
[DEPEND] Dependency failed for syste[  130.037685] XFS (vda3): Ending clean mount
md-homed.service - Home Area Manager.
[DEPEND] Dependency failed for systemd-psto…atform Persistent Storage Archival.
[DEPEND] Dependency failed for chronyd.service - NTP client/server.
[DEPEND] Dependency failed for raid-check.t…r - Weekly RAID setup health check.
[DEPEND] Dependency failed for fstrim.timer…used filesystem blocks once a week.
[DEPEND] Dependency failed for var-lib-nfs-…ipefs.mount - RPC Pipe File System.
[DEPEND] Dependency failed for rpc_pipefs.target.
[DEPEND] Dependency failed for rpc-gssd.ser… service for NFS client and server.
[DEPEND] Dependency failed for basic.target - Basic System.
[DEPEND] Dependency failed for multi-user.target - Multi-User System.
[DEPEND] Dependency failed for graphical.target - Graphical Interface.
[DEPEND] Dependency failed for systemd-logind.service - User Login Management.
[DEPEND] Dependency failed for systemd-upda…ecord System Boot/Shutdown in UTMP.
[DEPEND] Dependency failed for systemd-tpm2-setup.service - TPM SRK Setup.
[DEPEND] Dependency failed for systemd-rand…service - Load/Save OS Random Seed.
[DEPEND] Dependency failed for local-fs.target - Local File Systems.
[DEPEND] Dependency failed for selinux-auto…k the need to relabel after reboot.
[DEPEND] Dependency failed for systemd-jour…lush Journal to Persistent Storage.
[  OK  ] Mounted boot.mount - /boot.
[  OK  ] Stopped systemd-ask-password-conso…equests to Console Directory Watch.
[  OK  ] Stopped systemd-ask-password-wall.…d Requests to Wall Directory Watch.
[  OK  ] Reached target paths.target - Path Units.
[  OK  ] Reached target timers.target - Timer Units.
[  OK  ] Reached target ssh-access.target - SSH Access Available.
         Mounting boot-efi.mount - /boot/efi...
[  OK  ] Reached target cloud-init.target - Cloud-init target.
[  OK  ] Reached target nfs-client.target - NFS client services.
[  OK  ] Reached target remote-fs-pre.targe…reparation for Remote File Systems.
[  OK  ] Reached target remote-integrityset…Remote Integrity Protected Volumes.
[  OK  ] Reached target remote-veritysetup.… - Remote Verity Protected Volumes.
         Starting ostree-remount.service - OSTree Remount OS/ Bind Mounts...
[  OK  ] Reached target getty.target - Login Prompts.
         Starting cloud-init-local.service …-init: Local Stage (pre-network)...
[  OK  ] Reached target remote-cryptsetup.target - Remote Encrypted Volumes.
[  OK  ] Reached target remote-fs.target - Remote File Systems.
         Starting systemd-userdb-load-crede…r/group Records from Credentials...
[  OK  ] Reached target sockets.target - Socket Units.
[  130.211783] FAT-fs (vda2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  OK  ] Reached target bootc-status-update… - Bootc status trigger state sync.
[  OK  ] Started emergency.service - Emergency Shell.
[  OK  ] Reached target emergency.target - Emergency Mode.
         Starting systemd-binfmt.service - Set Up Additional Binary Formats...
[  OK  ] Mounted boot-efi.mount - /boot/efi.
[  OK  ] Finished ostree-remount.service - OSTree Remount OS/ Bind Mounts.
[  OK  ] Finished systemd-userdb-load-crede…ser/group Records from Credentials.
[  OK  ] Reached target nss-user-lookup.target - User and Group Name Lookups.
[  OK  ] Stopped target ssh-access.target - SSH Access Available.
         Mounting proc-sys-fs-binfmt_misc.m…cutable File Formats File System...
         Starting systemd-tmpfiles-setup.se…ate System Files and Directories...
[  130.008035] sh[1385]: nc: /run/cloud-init/share/local.sock: Connection refused
[  OK  ] Mounted proc-sys-fs-binfmt_misc.mo…xecutable File Formats File System.
[  OK  ] Finished cloud-init-local.service …ud-init: Local Stage (pre-network).
[  OK  ] Finished systemd-binfmt.service - Set Up Additional Binary Formats.
[  OK  ] Reached target cloud-config.target - Cloud-config availability.
[  OK  ] Reached target network-pre.target - Preparation for Network.
[  OK  ] Finished systemd-tmpfiles-setup.se…reate System Files and Directories.
         Starting systemd-oomd.service - Us…space Out-Of-Memory (OOM) Killer...
         Starting systemd-resolved.service - Network Name Resolution...
[  OK  ] Started systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer.
[  OK  ] Started systemd-resolved.service - Network Name Resolution.
[  OK  ] Reached target network.target - Network.
[  OK  ] Reached target network-online.target - Network is Online.
[  OK  ] Reached target nss-lookup.target - Host and Network Name Lookups.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, or "exit"
to continue bootup.
Enter root password for system maintenance
(or press Control-D to continue):

Looks like the cause is var "[FAILED] Failed to mount var.mount - /var."

bash-5.3# systemctl status var.mount
× var.mount - /var
     Loaded: loaded (/run/systemd/generator/var.mount; generated)
     Active: failed (Result: exit-code) since Mon 2025-11-24 07:19:15 UTC; 8min>
 Invocation: bfca267cc38e489d841f59ba05765fa3
      Where: /var
       What: /sysroot/ostree/deploy/default/var
       Docs: man:ostree(1)
   Mem peak: 1M
        CPU: 9ms

Nov 24 07:19:15 default-0 systemd[1]: Mounting var.mount - /var...
Nov 24 07:19:15 default-0 mount[1356]: mount: /var: special device /sysroot/ost>
Nov 24 07:19:15 default-0 mount[1356]:        dmesg(1) may have more informatio>
Nov 24 07:19:15 default-0 systemd[1]: var.mount: Mount process exited, code=exi>
Nov 24 07:19:15 default-0 systemd[1]: var.mount: Failed with result 'exit-code'.
Nov 24 07:19:15 default-0 systemd[1]: Failed to mount var.mount - /var.

dmesg: https://lnie.fedorapeople.org/dmesg.txt

Any idea how to avoid the failure?

FYI, the system would be good after I virsh destroy and then virsh start it

Nov 24 '25 08:11 skycastlelily