zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Systemd umounts dataset under encrypted root because it thinks the encryption root key is not loaded

Open sotiris-bos opened this issue 4 years ago • 19 comments

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version
Linux Kernel 5.4.79-1-lts
Architecture x86_64
ZFS Version v0.8.5-1
SPL Version 0.8.5-1

Describe the problem you're observing

I was manually mounting/unmounting ZFS datasets that are under an encrypted dataset. At some point, mounting stopped working. Systemd somehow thought the top encrypted dataset was locked even though it was not and it unmounted the datasets as soon as I mounted them.

Describe how to reproduce the problem

Not sure. I did initiate a rollback of the dataset I was mounting and unmounting but it was not the encryption root.

I am using zfs-mount-generator, surely that has something to do with it. To solve the problem, I ran: systemctl start zfs-load-key-io-enc.service

Include any warning/errors/backtraces from the system logs

Nov 27 17:37:56 centnas systemd[1]: var-lib-docker.mount: Unit is bound to inactive unit zfs-load-key-io-enc.service. Stopping, too.
Nov 27 17:37:56 centnas systemd[1]: Unmounting /var/lib/docker...
Nov 27 17:37:56 centnas systemd[3465]: var-lib-docker.mount: Succeeded.
Nov 27 17:37:56 centnas systemd[1]: var-lib-docker.mount: Succeeded.
Nov 27 17:37:56 centnas systemd[1]: Unmounted /var/lib/docker.

sotiris-bos avatar Nov 27 '20 15:11 sotiris-bos

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 02 '21 01:12 stale[bot]

I have a very similar issue. Appeared right after upgrade from ubuntu 21.10 to 22.04. My rpool/USERDATA contains a number of encrypted zfs filesystems, and trying to mount any except the one that gets mounted as home directory fails silently, that is, after mount attempt dataset stays in status unmounted and mountpoint contains no files.

All encrypted mountpoints have the following setup:

CANMOUNT    ENCRYPTION   KEYSTATUS
noauto      aes-256-gcm  available

I then noticed that syslog reacts with the following lines after each mount attempt:

systemd[1]: Unmounting <MOUNTPOINT>...
systemd[1]: <MOUNTPOINT>.mount: Deactivated successfully.
systemd[1]: Unmounted <MOUNTPOINT>.

Looks like it actually gets mounted, but then systemd or something else unmounts it immediately. One more test:

ls <MOUNTPOINT>; sudo zfs mount rpool/USERDATA/<DATASET>; ls <MOUNTPOINT>

shows empty dir followed by populated dir! One second later the <MOUNTPOINT> is empty again and syslog shows three lines as above.

So some logic is activated that immediately unmounts mounted encrypted zfs dataset, but only if this dataset is not mounted as $HOME. I will dig deeper but my knowledge of systemd is clearly insufficient for productive debugging.

AlexeyGusev avatar May 03 '22 08:05 AlexeyGusev

Had the same issue and It appeared that ZFS was asking for password at boot. I couldn't see it until I was watching reboot on the physical monitor. I hope that helps.

crazybert avatar Jul 12 '22 15:07 crazybert

@AlexeyGusev Any luck with this issue? Disabling keys loading at startup didn't help, and I am also seeing it working with one mountpoint but not the other.

crazybert avatar Jul 12 '22 16:07 crazybert

@AlexeyGusev Any luck with this issue?

I have no solution yet; what I do is a dirty hack that is so silly that I am ashamed sharing it :)

Once the dataset was mounted and immediately unmounted by the system (as described in my previous post above), do this:

  1. change dataset mountpoint with zfs set mountpoint
  2. remove old mountpoint dir with rmdir, link old mountpoint to new mountpoint with ln -s if you want to preserve path
  3. mount with zfs mount

The system does not unmount the dataset if the mountpoint is changed manually; don't know why, but for now this just works. Since I don't reboot my PC often, it's an ugly but sufficient hack.

AlexeyGusev avatar Jul 12 '22 16:07 AlexeyGusev

@AlexeyGusev Yeah seeing exactly the same behavior. 22.04, upgraded from 20.04 last night.

crazybert avatar Jul 12 '22 16:07 crazybert

@behlendorf Sorry not quite sure, is there a way to get some attention to this issue? Seems like this is some sort of regression, at least in Ubuntu 22.04.

Thanks!

crazybert avatar Jul 13 '22 02:07 crazybert

Let's start by reopening this issue. Do I understand correctly that you were not observing this with 20.04 and ZFS v0.8.5? @aerusso @rlaager any thoughts?

behlendorf avatar Jul 28 '22 23:07 behlendorf

@behlendorf Correct, although I can not guarantee that versions are correct, but I was keeping everything up-to-date. Ubuntu 22.04 that I have now is pretty much out of the box installation, I followed this doc for ZFS boot from EFI

https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html

From what I can tell, systemd immediately unmounts any encrypted ZFS volume right after zfs mount.

crazybert avatar Jul 29 '22 00:07 crazybert

@behlendorf thank you for reopening this issue.

crazybert avatar Jul 29 '22 00:07 crazybert

@behlendorf At this point I also have zfs-dkms installed to address mysql issue but it made no difference.

crazybert avatar Jul 29 '22 00:07 crazybert

Do I understand correctly that you were not observing this with 20.04 and ZFS v0.8.5?

This issue was never seen before 22.04.

This issue also seems to be configuration-dependent. I only observe this on one of my two machines with (nearly-)identical setup.

AlexeyGusev avatar Jul 29 '22 06:07 AlexeyGusev

@behlendorf @aerusso @rlaager I am currently observing the following behavior:

I entered the key on boot using attached keyboard and monitor. Encrypted volumes got mounted on boot and stay mounted. Not a workaround obviously but I hope this helps.

crazybert avatar Jul 29 '22 16:07 crazybert

@crazybert You said your setup is pretty stock. Steps to reproduce this would help a lot.

In the original report, there was the following systemd log message:

var-lib-docker.mount: Unit is bound to inactive unit zfs-load-key-io-enc.service. Stopping, too.

Are you (@crazybert @AlexeyGusev) seeing that sort of log entry (with a different unit name) too? I think it will be [email protected] these days, rather than zfs-load-key-DATASET.service.

Either way, what are you seeing for systemctl status [email protected]

What are the contents of /run/systemd/generator/DATASET.mount?

rlaager avatar Aug 01 '22 23:08 rlaager

@rlaager I pretty much followed there instructions to letter from OpenZFS

https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html

Regarding Docker, I am seeing only these references to var-lib-docker.mount:

kernel: [ 25.711555] zfs-mount-generator: var-lib-docker.mount already exists. Skipping.

Regarding the keys loader, at this point I might be in a different state: manual (un)mounts work as expected, but in order to achieve that I actually have to enter the keys on boot using attached keyboard.

at this point it looks like they're loaded on boot regardless of what I do (comments removed): cat /etc/default/zfs

ZFS_LOAD_KEY='no' ZFS_UNLOAD_KEY='yes' ZFS_MOUNT='yes' ZFS_UNMOUNT='yes' ZFS_SHARE='yes' ZFS_UNSHARE='yes' ZPOOL_IMPORT_ALL_VISIBLE='no' VERBOSE_MOUNT='no' DO_OVERLAY_MOUNTS='no' ZPOOL_IMPORT_OPTS="" MOUNT_EXTRA_OPTIONS="" ZFS_DKMS_ENABLE_DEBUG='no' ZFS_DKMS_ENABLE_DEBUGINFO='no' ZFS_DKMS_DISABLE_STRIP='no'

Regarding systemctl status zfs-load-key:

systemctl status zfs-load-key* shows both encrypted volumes as Active (when keys are entered)

I tried systemctl disable zfs-load-key-rpool-data-set-path.service, but they just respawn on the next reboot

If I just hit Enter few times instead of entering the keys:

The system boots,

systemctl status zfs-load-key* shows Failed

zfs load-key rpool/data/set/path

systemctl status zfs-load-key* shows Failed

systemctl start zfs-load-key-rpool-data-set-path.service

Active

After that, mount works as expected, and starting the service after zfs load-key might be a working solution, I just have to figure out how to prevent boot process from being blocked by the key loader.

Now you might ask how did I get to where I am and I honestly don't know. I did not have the problem with blocked boot process the first time. What I might have done differently second time is disabling/enabling zfs-zed and installing zfs-dkms which was updated recently.

crazybert avatar Aug 02 '22 00:08 crazybert

@rlaager in 20.04, I believe I had zfs-load-keys disabled. This is probably why this wasn't an issue.

crazybert avatar Aug 02 '22 00:08 crazybert

@crazybert Am I correct in understanding that, in the real world scenario (not just testing), it goes like this:

Steps to reproduce:

  1. Install Ubuntu 22.04 Root-on-ZFS. (That may not all be required, but at least Ubuntu 22.04 with zfs-mount-generator will be required.)
  2. Create an encrypted dataset that is NOT required for booting or logging in.
  3. Boot. Do NOT provide the key (passphrase, presumably) at boot.
  4. Login. Run zfs load-key manually.
  5. Mount a filesystem. Are you doing so via zfs mount DATASET or systemctl start DATASET.mount?

Expected result: The filesystem is mounted. Actual result: ZFS mounts the filesystem, but then systemd immeidately unmounts it because the DATASET.mount unit requires (or is bound to) [email protected], which is in state Failed.

rlaager avatar Aug 02 '22 00:08 rlaager

@rlaager That sounds correct (can't judge about the root cause but this sounds reasonable), although I can't explain why I wasn't seeing blocking key prompt at boot time before since I accidently noticed it when I attached a monitor.

crazybert avatar Aug 02 '22 00:08 crazybert

Okay seems like this worked for me:

zfs set org.openzfs.systemd:ignore=on rpool/data/set/path

touch /etc/zfs/zfs-list.cache/rpool zfs set relatime=off rpool/data/set/path

zfs inherit realtime rpool/data/set/path said invalid property 'realtime', so insead:

zfs set relatime=on rpool/data/set/path

systemctl daemon-reload

reboot.

  • No boot prompt,
  • systemctl status zfs-load-key* no longer shows volume services
  • zfs load-key && zfs mount Works as expected.

Thanks @rlaager for the clues, thanks @AlexeyGusev for starting this thread!

crazybert avatar Aug 02 '22 01:08 crazybert

@crazybert zfs set org.openzfs.systemd:ignore=on does the trick for me. Thank you for the tip!

AlexeyGusev avatar Aug 19 '22 08:08 AlexeyGusev

I have the same issue and I have been using the workaround mentioned by @AlexeyGusev (setting mount points to none and re-mounting and then setting a symbolic link to the old path). I have also seen my volumes get unmounted randomly after a day or two. I am wondering if it's due to apt running in the background. Didn't have this issue on 20.04. I'd love to go back TBH, but I did run zfs upgrade.

aceface25 avatar Dec 08 '22 19:12 aceface25

Well the following works for me:

sudo systemctl start zfs-load-key-<pool-data-set-path>

It actually prompts for the key, after that one mounting the dataset would work and persistent.

lkishalmi avatar Dec 08 '22 19:12 lkishalmi

I had similar issue on my system with two "noauto" datasets. My system is setup with ZFS on LUKS key-encrypted root (Linux Mint setup) then with two additional "noauto", separately encrypted datasets created under the same root pool. Anyways, the computer had a power outage. Then thereafter, the encrypted root pool mounted and various other datasets mounted without any issues. The two additional "no auto", separately encrypted datasets, however, would not mount with same symptoms as others have described above by using the following commands:

sudo zfs load-key rpool/dataset sudo zfs mount rpool/dataset

The encryption key was successfully loaded — I can subsequently unload then reload the key, and mount does not complain about key not loaded when I execute the load-key command first. Regardless, the mount command gives no error feedback at all except one notice the dataset is not mounted after the command was executed. Only clue as to what might have happened is syslog has similar "mount Deactivated" message. Try getting systemd status:

systemctl status dataset.mount

similarly just show the unmount / deactivated message, no error.

I cannot for the life of me figure out the sequence systemd goes through to mount the ZFS dataset. I traced the process somewhat to zfs mount generator script under /lib/systemd/system/generator/zfs-mount/generator. Tinkering with this file then leads me to /run/systemd/generator, which contains the mount scripts for my various datasets. But, still no clue as to why the when the dataset mount command is executed, it only unmounts the dataset.

The only thing I can think of is possibly that perhaps the dataset has some error (or is there a clean unmount flag somewhere?) that prevents mounting. But then zpool scrub yielded no error and indicated healthy pool. In any case, for whatever reason, the user is not provided any clues to what happened. Finally seeing rlaager's post, trying:

sudo systemctl start dataset.mount

finally mounted the zfs dataset. No idea why sudo zfs mount rpool/dataset did not work....

minienigma avatar Jan 04 '23 16:01 minienigma

same bug here on Arch Linux. zfs set org.openzfs.systemd:ignore=on works around the bug acceptably :partying_face:

josephbburg avatar Jan 11 '24 18:01 josephbburg