dracut
dracut copied to clipboard
Resume does not work (potential #cee5853 regression?)
Referencing this bug which did not get any love at all: https://bugzilla.redhat.com/show_bug.cgi?id=1842279
Fedora 32, x64 fresh installation. Using LVM + LUKS (that is, LUKS encrypted LVM) that contains rootfs + swap partitions. This might be potentially happening since https://github.com/dracutdevs/dracut/pull/715
After a fresh Fedora install in the above indicated setup, /sys/power/resume is "0:0", which causes the dracut setup scripts to not install the required resume-from-disk machinery to recover from hibernation. This behaves like a loop since the lack of a defined swap partiton results in "dracut -f" generating ramdisks that do no contain the kernel arg "resume=foo" and so forth.
IMHO the script should gracefully handle the case where there's no swap partition defined for resume but one is available.
Please assume good intent. I'm not very familiar with Dracut nor Linux booting system so I could be wrong. However more users seem to experience this, suggesting there's indeed a bug and it is relatively easy to solve (see Bugzilla)
This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.
Perhaps @danimo or @nabijaczleweli can comment on this?
I'd argue that the post-#715 behaviour is strictly more correct logically in that host-only means (initrd has resume) <=> (host has resume), but that's worth very little if it's confusing.
Does the diff below work for you? It should re-add the old behaviour.
diff --git a/modules.d/95resume/module-setup.sh b/modules.d/95resume/module-setup.sh
index 96c2573e..9f16537b 100755
--- a/modules.d/95resume/module-setup.sh
+++ b/modules.d/95resume/module-setup.sh
@@ -13,7 +13,7 @@ check() {
# Only support resume if hibernation is currently on
# and no swap is mounted on a net device
[[ $hostonly ]] || [[ $mount_needs ]] && {
- swap_on_netdevice || [[ "$(cat /sys/power/resume)" == "0:0" ]] && return 255
+ swap_on_netdevice || [[ "$(cat /sys/power/resume)" == "0:0" ]] || { echo "${host_fs_types[@]}" | grep -qwE 'swap|swsuspend|swsupend'; } && return 255
}
return 0
Alternatively, $swap_devs looks like it could maybe be checked for emptiness instead?
Can't easily check it, but I'm assuming that should work. I guess there's not much alternative to it, isn't it? It's a chicken and egg problem: you can't enable resume because the config checks whether it is enabled to enable it in the cmdline. I'm curious why am I only seeing this? There should be more users with the same issue right? I can say that this happened to both my computers after installing a fresh fedora 32 system with a luks encrypted fs. (That is: Luks + LVM (ext4 + swap). Any ideas? Thanks!!
I mean, you can either manually tell dracut to include the resume module, or write maj:min of your swap partition to /sys/power/resume from your normal system after you boot, so I wouldn't say it's a chicken-egg as much as it's just a gotcha that could be avoided.
As for the second point, no idea, I don't use LUKS or LVM
Lemme quote myself (I had to re-read my bugzilla bug to get more context since I forgot a bit why this is happening):
"[...]If I ever boot my kernel with resume disabled that means that if I rebuild my initramfs (or upgrade the kernel via yum) the support for resume will be disabled. Is this what we really want? And how does this work during the initial install? I'm assuming there's no swap so the generation of the initramfs during a fresh install has to be somehow different right?"
Does that make sense? I think it is indeed true that if one boots without resume support (let's say you wanna ignore your hibernation image and boot normally) and updates the machine (via dnf), initramfs will be rebuilt and resume support will be dropped. Not sure whether that's desirable, but I also understand it might make sense in other situations or use cases that I do not use (perhaps!).
Many thanks!!
dracut.conf(5) lists add_dracutmodules+=" <dracut modules> ", if you add resume there you will have it unconditionally. Installer can pass --add resume as well, according to dracut(8), or just enable resume properly, if that's its domain (I wouldn't know, I've never installed Fedora).
I'm, personally, confused so as to what the issue is – hostonly exists specifically to avoid pulling in modules that aren't explicitly needed, and whether you support resume or not is host-specific configuration (even my patch upthread is, I'd argue, wrong from a purist's perspective, since I have at least one machine with more RAM than persistent storage, and it cannot be meaningfully hibernated (and, therefore, resumed), but has some swap).
If you boot with resume off, for one reason or another, you can turn it on later by echoing maj:min into /sys/power/resume to get your host to its normal state. Before then, your host is degraded (or, well, in maintenance, since you chose to do it), and, since you know what's different from the usual, you can take measures to work around the temporary misconfiguration.
Well let's try to simplify the issue if that makes it hard to tacke: Suspend to disk does not work on a freshly installed fedora 32 system. The underlying issue is that resume is not present and therefore the swap contents are being ignored.
Also there's updates in https://bugzilla.redhat.com/show_bug.cgi?id=1842279 that indicate this might still be the case in Fedora 33.
I have reproduced this without LUKS; if at any point you boot the system with the swap device missing, AND run dracut (i.e. after a kernel update), then dracut will notice /sys/power/resume is 0:0 and don't install the resume module in the next initrd. Therefore you enter a vicious circle since /sys/power/resume will never be set again.
openSUSE does pass resume=UUID=xxx via bootloader cmdline, but this is not resolved by the kernel and also ignored by initrd when it has no resume module. The only way to break the cycle is to set /sys/power/resume manually and regenerate initrd; or to force resume module in dracut.conf . Guess distro policy should reflect this.
While debugging this I also noticed that I should probably also file a bug re 480aa9695f . swap_on_netdevice check seems not to be doing anything, since swap_devs array contains /dev/xxx-style names but block_is_netdevice expects MAJOR:MINOR format. :/
This is afflicting Mageia too (mga#28528).
@danimo, have you had a chance to look at solving this issue?
If you boot with resume off, for one reason or another, you can turn it on later by echoing maj:min into
/sys/power/resumeto get your host to its normal state. Before then, your host is degraded (or, well, in maintenance, since you chose to do it), and, since you know what's different from the usual, you can take measures to work around the temporary misconfiguration.
So, one installed a new machine, set up hibernation by writing to /sys/power/resume, rebooted to test it and... voila... resume has failed. And it's completely non-obvious why it's failed. There is no hint anywhere that initrd needs to be regenerated.
In addition, the expectation that hibernation does have to be configured explicitly is simply not true anymore. When one uses "systemctl hibernate", one doesn't have to configure anything. One doesn't even need to know about the existence of /sys/power/resume...
I also got hit by this problem. I also managed to work around by manually writing to /sys/power/resume and re-running dracut. Luckily I didn't waste too much time on this issue because I somehow realized that the output of dracut didn't say anything about the resume= parameter. One could easily argue that this is not a bug, but definitely it's user unfriendly.
Since commit 733c71ce9e2d161c9e04772aeb1c5fb38e3fcb3a resume is exclusively done by systemd-hibernate-resume on systemd based systems. The manpage of systemd-hibernate-resume shows:
--> systemd-hibernate-resume only supports the in-kernel hibernation implementation, known as swsusp[1]. Internally, it works by writing the major:minor of specified device node to /sys/power/resume. --<
The device node is handed over by systemd-hibernate-resume-generator which reads the resume parameter from /proc/cmdline.
So, the check for the device in /sys/power/resume is indeed wrong when systemd-hibernate-resume is used. Question is how to better check if hibernation is is enabled on the host (that is what #715 tried to implement).
I'm a Gentoo user, and I have faced several issues related to resume not working with https://github.com/bircoph/suspend . As you can see in
https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/95resume/parse-resume.sh#L10
function label_uuid_to_dev is called, but as you can see here:
https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/99base/dracut-lib.sh#L586-L603
when the resume device is already a path to a device, no condition is matched and it returns nothing, which leads to the else block at
https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/95resume/parse-resume.sh#L70-L79
So never happens
https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/95resume/parse-resume.sh#L69
Many parts in parse-resume.sh depend on having /usr/sbin/resume in place, which is detected at
https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/95resume/module-setup.sh#L53
but if your distro installs in /usr/lib64 instead of /usr/lib , the resume binary is never detected, and never included in dracut generated initramfs.
IMHO https://github.com/dracutdevs/dracut/blob/10ed204f873f454dcd15ffcc82dc3a1c781c1514/modules.d/95resume/module-setup.sh#L67 should include an additional line like to be more robust
inst_hook pre-mount 10 "$moddir/resume.sh"
I have just created pull request https://github.com/dracutdevs/dracut/pull/1607 with the minor fixes I have added to my installation in order to have it working
Something I'm experiencing using the changes in my pull request in order to use s2disk is that sometimes the resume process from hibernation does not work. But, as the whole boot process stalls, I only have to do a hard shutdown and boot in order to try again, and it can work. It is like some device or condition is not ready yet, but I still have to gather additional information.
Is this still an issue with an dracut release that contains the patches from @jmfernandez ?
Is this still an issue with an dracut release that contains the patches from @jmfernandez ?
His patches were unrelated to the original issue.
So yes, it's still an issue, comments https://github.com/dracutdevs/dracut/issues/924#issuecomment-813940269 and https://github.com/dracutdevs/dracut/issues/924#issuecomment-825320257 reflect the problem we have here.
afaik can gather, it's strictly a problem with resuming from hibernation ( not suspending or hibernating and systemd has been hit by it as well ) because the resume kernel parameter is missing.
Now we are not responsible for resuming and the kernel seems to be riddle with bugs related to this ( most recent I could find was this 1 which means some random hw works others fail ).
Now whatever is adding that kernel parameter ( be it the user or some application ) it needs to enable the resume module and rebuild the initrd so I'm quite frankly failing to see how we are supposed to fix this issue o_O
What expectation are people having of us? How are we supposed to somehow fix this?
afaik can gather, it's strictly a problem with resuming from hibernation ( not suspending or hibernating and systemd has been hit by it as well ) because the resume kernel parameter is missing.
No. The problem is that the condition for including the resume module is overly strict: it assumes that the resume partition in /sys/power/resume must be configured manually. That was true 10 years ago, but it's not needed anymore with systemctl hibernate. IMHO existence of a local swap should be sufficient reason for including the module.
E.g., Anaconda (the installer of Fedora/RHEL) automatically adds an appropriate resume= to the installed kernel's command line if a swap partition of a sufficient size is created (which it is in the default layout). Then, in the installed system, one can just run systemctl hibernate and the system hibernates and is resumed again after the machine is turned on. Except that it isn't, because the initrd doesn't contain the resume module for the aforementioned reason.
Yes, hibernate generally sucks, but that doesn't mean we shouldn't strive to make it suck a bit less if it's possible.
E.g., Anaconda (the installer of Fedora/RHEL) automatically adds an appropriate resume= to the installed kernel's command line if a swap partition of a sufficient size is created
YaST does the same thing on SUSE distros.
IMHO existence of a local swap should be sufficient reason for including the module.
I agree. Proposed patch:
--- a/modules.d/95resume/module-setup.sh
+++ b/modules.d/95resume/module-setup.sh
@@ -10,10 +10,11 @@ check() {
return 1
}
- # Only support resume if hibernation is currently on
- # and no swap is mounted on a net device
+ # Only support resume if there is any suitable swap and
+ # it is not mounted on a net device
[[ $hostonly ]] || [[ $mount_needs ]] && {
- swap_on_netdevice || [[ -f /sys/power/resume && "$(cat /sys/power/resume)" == "0:0" ]] && return 255
+ ((${#swap_devs[@]})) || return 255
+ swap_on_netdevice && return 255
}
return 0
Then those installers should write to /sys/power/resume (and resume_offset, if the swap is on file) before generating an initrd, it's that simple.
By definition hibernation is configured iff /sys/power/resume or resume= (the latter strictly only really applies if you manage to resume from the kernel in lateinit, but the initrd also consumes resume= to emulate that, so meh).
If we take "host-only" to mean "the minimal amount of stuff needed to boot the current host", then this violates that by including resume on all hosts with swap, which is all hosts, esp. since fedora started using systemd-zram-generator by default.
For some reason broken installers that don't understand that "having swap" is largely unrelated from "having hibernation" means that dracut should always include hibernation handling? Fix your installer (left as an exercise to the reader). Fix your system (ls -l $SWAPDEV | awk '{gsub(",", ""); print $5 ":" $6}' > /sys/power/resume).
https://github.com/dracutdevs/dracut/pull/2160#issuecomment-1464496727:
- the grep for "resume=" is only performed if hibernation is currently on and no swap is mounted on a net device, which is the opposite of what is wanted
- the grep always matches because it matches either '^' or '[[:space:]]resume='
# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.14.21-150400.24.55-default root=/dev/mapper/cr-auto-1 security=apparmor
# grep -rq '^\|[[:space:]]resume=' /proc/cmdline
# echo $?
0
- if the grep matches, check() returns 255, which indicates failure, not success
@martinwhitaker could you submit your patch as a PR?