zfs icon indicating copy to clipboard operation
zfs copied to clipboard

make Linux hibernation (suspend-to-disk) more robust

Open problame opened this issue 2 years ago • 15 comments

( This is a follow-up of https://github.com/openzfs/zfs/issues/260#issuecomment-978143047 and subsequent comments. )

Background

When resuming a hibernated system, either the kernel or initrd loads the hibernation image back into RAM. After restoring the pre-hibernation in-core from that image, the kernel resumes operation by unfreezing kthreads and user processes.

It is not safe to use a local zpool for swap space, and hence also not for hibernation. The reason is that there's an inherent chicken-and-egg problem between freezing kernel threads (and hence the ZIO pipeline) before creating the hibernation image, and then writing the hibernation image to stable storage.

I believe it is safe to hibernate a system with imported zpools, if the swapfile/hibernation image is stored on a block device that is safe to use by the kernel's hibernation procedure. For example, a raw block device, or a LUKS volume.

However, there are several problems to such setups:

This issue proposes to address the latter category by making ZFS more robust. Bugs in initrd scripts should not be able to cause full pool corruption as easily as they can today.

Let me quote @danielmorlock's and my findings on this issue. It was on Gentoo, but I wouldn't want to rule out that the problem is present in other distro's initrd scripts as well.

TL;DR: The problem was in genkernel(Gentoo automatic kernel building scripts) that includes a script for initramfs. This script is doing luks-encryption and boots from what is listed in the boot options. In the case of ZFS, I have a crypted swap and a crypted root including the ZFS pool. The initramfs script decrypts the root and imports the zpool BEFORE if decrypts the SWAP including the RAM state for hibernation. So the pool is always imported and then hibernate resumes the system where the zpool is already imported and online. I guess it is probably the reason for my corrupted zpool.

Design Proposal

(Copied from https://github.com/openzfs/zfs/issues/260#issuecomment-978211802 )

Import of a pool that is part of hibernated system should fail. Even import -f should fail. And the failure should be unambiguously pointing the user to the problem, and explain how to resolve the situation.

Proposal for the hibernation workflow:

  • Somehow get notified by the kernel that we're hibernating.
    • The (unimplemented) freeze_fs and thaw_fs are not useful for this.
    • There are some power-management APIs that could be hijacked. Most of them are GPL-only though.
    • This is the big implementation problem. Suggestions are very welcome.
  • Generate a random hibernation cookie.
  • Store the hibernation cookie in the in-DRAM spa_t, and somewhere on disk, e.g. somewhere in the MOS in a ZAP. org.openzfs:hibernation_cookie=$cookie_value.
  • Wait until the txg that contains the ZAP update has synced out.
  • Let all later txgs get stuck in transitioning from quiescing -> syncing
    • Look into whether we can re-use zio_suspend / zio_resume machinery for this.
  • Allow kernel to proceed with hibernation. It must not be allowed before we reach this point.

Proposal for the resume workflow:

  • Let kernel & initrd restore DRAM state, kernel threads and userland.
  • Somehow get notified from the kernel that we're resuming.
    • Again, this is the big implementation problem.
  • For each imported spa_t:
    • Read the org.openzfs:hibernation_cookie from disk.
      • Must not read from ARC for caching, nor modify ARC state. We don't know whether it's valid at this point.
    • Compare the hibernation cookie in the in-DRAM spa_t with the value loaded from disk.
    • If they don't match, panic the kernel with the following message:
      zpool $spa_name was used inbetween hibernation and resume.
      Cannot resume from this hibernation image.
      
    • If they match, allow quiescing -> syncing transitions again.

To prevent accidental imports, we extend zpool import / spa_import such that they will fail by default if a hibernation cookie is present in the on-disk MOS. This behavior can be overridden by a new flag zpool import --discard-hibernation-state-and-fail-resume.

problame avatar Dec 12 '21 13:12 problame

Root on ZFS guides for Arch Linux and NixOS, hosted at openzfs-docs, already feature instructions for suspending to LUKS2-encrypted disk partitions. Maybe worth reviewing those and fix any existing deficiencies.

Hibernation also has other requirements besides file system support. I have had all sorts of problems with Ryzen APU graphics and VirtIO graphics (black screen, frozen screen on resume, etc.)

ghost avatar Dec 23 '21 08:12 ghost

#10924: Hibernation also requires ARC to be emptied

StefanoBalocco avatar Jan 04 '22 23:01 StefanoBalocco

I'm probably one of those NixOS LUKS users.

I don't currently remember much of the details, but IIRC, my only major ZFS corruption to date has come from (orthogonally from the above) something like the following sequence:

  • hibernate
  • boot without restoring hibernation (and in the process mounting and using the pool)
  • boot back into hibernation image

I'm not familiar with ZFS's internal mechanisms, but I believe this resulted in some high level metadata corruption and most of the data remaining otherwise intact but hard to recover.

Improved robustness would be greatly welcome. :tada:

deliciouslytyped avatar Feb 06 '22 06:02 deliciouslytyped

I think until this works, the manual should be updated to explicitly say that hibernation on ZFS/zvol does not work.

Right now it doesn't; in fact it even mentions that you can put swap on zvol and only mentions "may lead to deadlock" when in practice people get corruption which is much worse than deadlock!

nh2 avatar Nov 28 '22 02:11 nh2

Sorry, but I don't agree that hibernation doesn't work. All corruption issues I followed so far (including #14118, see https://github.com/openzfs/zfs/issues/14118#issuecomment-1303563790) were caused by the boot process importing a hibernated pool before resuming from hibernation. Importing an already imported pool of course has the potential of corrupting it. Swap must be located on a separate, non-ZFS partition for hibernation to work, but I think that's expected. I'm using hibernation on a daily basis for years without any issue so far.

What could be done is documenting the caveats and ideally adding code to detect suspended pools and refusing to import them.

AttilaFueloep avatar Nov 28 '22 11:11 AttilaFueloep

@AttilaFueloep

to explicitly say that hibernation on ZFS/zvol does not work.

Swap must be located on a separate, non-ZFS partition for hibernation to work

I think you are both stating the same thing, which is that hibernation should NOT be used with swap on a ZFS volume, and @nh2 states that THAT (NOT having swap on ZFS volume when using hibernation) should be documented.

Greek64 avatar Nov 28 '22 11:11 Greek64

@AttilaFueloep there might be something to it, as I've had my ZFS pool holding the OS corrupted twice by accidentally hibernating and I'm running NixOS. As it turns out NixOS, for some unknown to me reason sets boot.zfs.forceImportRoot to true by default:

Forcibly import the ZFS root pool(s) during early boot. This is enabled by default for backwards compatibility purposes, but it is highly recommended to disable this option, as it bypasses some of the safeguards ZFS uses to protect your ZFS pools. If you set this option to false and NixOS subsequently fails to boot because it cannot import the root pool, you should boot with the zfs_force=1 option as a kernel parameter (e.g. by manually editing the kernel params in grub during boot). You should only need to do this once.

https://github.com/NixOS/nixpkgs/blob/695b3515251873e0a7e2021add4bba643c56cde3/nixos/modules/tasks/filesystems/zfs.nix#L259-L275

Recently they added an boot.zfs.allowHibernation flag set to false by default to avoid hibernation issues:

  • https://github.com/NixOS/nixpkgs/pull/171680

Which is a very backwards way of solving the issue based on what you said, as what we should be doing is disable forceImportRoot by default, or raise an exception if both forceImportRoot and allowHibernation are true. I might try to do a PR like that.

jakubgs avatar Nov 28 '22 11:11 jakubgs

@Greek64 Yes, that's true. I was trying to summarize what was already proposed here (no swap on zvol and adding a mechanism to detect and refuse importing hibernated pools). Sorry for not being clear.

@jakubgs Yes, force importing the root pool isn't a good idea, I can't comment on the "backwards compatibility purposes," though. What I can't follow is the need to import the root pool during resume from hibernation. I'm using Arch and initcpio doesn't import pools during resume from hibernation, it simply skips the ZFS part of the boot process. I think the proper fix would be to refactor the NixOS boot process to do the same. Being not familiar with NixOS I can't tell if this viable though.

Thinking more of it, NixOS could detect if swap is on a ZVOL and disallow hibernation if so. If not, there would be no need to import pools during resume and the above would apply.

AttilaFueloep avatar Nov 28 '22 12:11 AttilaFueloep

@AttilaFueloep yes, you are correct. An actual fix is that the boot process itself would be adjusted to not import the pool at boot when resuming, but that would require a bunch of research and testing. For now I think I just want to add an assert to avoid people having both enabled at the same time. Can look into a proper solution later.

jakubgs avatar Nov 28 '22 12:11 jakubgs

Questions:

  • Is there any kind of mechanism that prevents importing a suspended volume? Does it require --force?
  • Or is that entirely left to the OS mechanism that does hibernation to handle?
  • If so, how can one tell if a volume is suspended without importing it?

jakubgs avatar Nov 28 '22 13:11 jakubgs

@jakubgs

Is there any kind of mechanism that prevents importing a suspended volume? Does it require --force?

Do you mean as of right now?
If so, then the only preventing mechanism that I know of is if a pool is not explicitly exported on shutdown, and then another system with a different hostid tries to import the pool (You have to use -f to force the import in this situation).
But this has nothing explicitly do to with a "suspended" pool.

But the purpose of this issue is exactly that, to add a prevention mechanism on "suspended" pools inside zfs itself.

Or is that entirely left to the OS mechanism that does hibernation to handle?

Going by the previous answer, until the proposal of this issue is implemented it is up to the OS/User to avoid and prevent importing "suspended" pools.

Greek64 avatar Dec 06 '22 14:12 Greek64

@problame

Import of a pool that is part of hibernated system should fail. Even import -f should fail.

This behavior can be overridden by a new flag zpool import --discard-hibernation-state-and-fail-resume.

Is it allowed/safe to import a "suspended" pool as readonly?

Some projects like ZFSBootMenu need to import the zfs pools on boot in order to get the initramfs file (the boot partition is part of the root zfs pool), but do so importing the pool as readonly.

If the above proposed changes are implemented as is, ZFSBootMenu would seize to work with hibernation, as it would need to use the --discard-hibernation-state-and-fail-resume flag for the import.

Greek64 avatar Dec 06 '22 14:12 Greek64

I do not even use ZFS for the OS / SWAP or Hybination file but face ZFS data corruption every time I attempt suspend

and for some reason I can never get ZFS to force unmount either so I can not export / import the pool when I want to put the PC to sleep

MasterCATZ avatar Apr 08 '23 09:04 MasterCATZ

I do not even use ZFS for the OS / SWAP or Hybination file …

This is the ZFS repo, so I suggest looking elsewhere for support in your context.

grahamperrin avatar Apr 08 '23 17:04 grahamperrin

This is the ZFS repo, so I suggest looking elsewhere for support in your context.

They are saying that they do use ZFS on non-OS/non-OS swap partitions, and that ZFS corrupts even in this case.

nh2 avatar Apr 13 '23 22:04 nh2

@nh2 thanks for clarification. I have hidden my previous comment (resolved).

grahamperrin avatar Jun 04 '23 10:06 grahamperrin