clevis
clevis copied to clipboard
Use clevis for ZFS native encryption passphrase
Introduction
I would like to use clevis to decrypt my ZFS root partition on several machines.
Using 2 VMs, I tried to test if this is at all possible, and I think I've come pretty far, but I still keep getting the password prompt. I have some ideas on how to approach this further, but I could use some help figuring out where to look next.
Any help is greatly appreciated :innocent:
(Summary at the bottom)
Given how far I've come, it doesn't strike me as a lot of work to add "out-of-the-box" ZFS support to clevis. If I can get it to work, I might work on a PR for that myself :slightly_smiling_face:
Use case
I have a use case in mind with two PC's (a desktop and a raspberry-pi, the pi will use LUKS) which will be mutual tang servers (i.e. they can both be rebooted remotely, just not at the same time), and a laptop that uses either tang server when it's on the same network, and a passphrase when it's not.
What I did so far
Setup
I created a CentOS 8.2 VM with a root partition on natively encrypted ZFS (using dracut and systemd-boot) and cloned it twice to make 2 machines:
tangThe server hosting the keys, ip:192.168.122.18clevisThe server asking for decryption keys, ip:92.168.122.242
For the ZFS pool layout I used this guide by OpenZFS, which I adapted for CentOS.
For the bootloader (systemd-boot) I used this page on the ArchWiki.
And for zfs-mount-generator I used this page on the ArchWiki.
[root@clevis:~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 1.87G 13.1G 192K /
rpool/ROOT 1.56G 13.1G 192K none
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195 1.55G 13.1G 1.43G /
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/srv 368K 13.1G 192K /srv
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/tmp 784K 13.1G 400K /tmp
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/usr 1016K 13.1G 192K /usr
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/usr/local 824K 13.1G 520K /usr/local
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var 78.2M 13.1G 192K /var
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/games 288K 13.1G 192K /var/games
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib 74.2M 13.1G 29.5M /var/lib
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/AccountsService 192K 13.1G 192K /var/lib/AccountsService
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/NetworkManager 632K 13.1G 292K /var/lib/NetworkManager
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/dnf 1.56M 13.1G 1.06M /var/lib/dnf
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/flatpak 192K 13.1G 192K /var/lib/flatpak
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/rpm 40.1M 13.1G 37.2M /var/lib/rpm
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/rpm-state 392K 13.1G 232K /var/lib/rpm-state
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/log 2.33M 13.1G 1.83M /var/log
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/spool 1012K 13.1G 348K /var/spool
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/spool/mail 352K 13.1G 192K /var/spool/mail
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/www 192K 13.1G 192K /var/www
rpool/USERDATA 320M 13.1G 192K /
rpool/USERDATA/home_1b9910ca-f889-4b64-8942-a139e62b1195 319M 13.1G 200K /home
rpool/USERDATA/home_1b9910ca-f889-4b64-8942-a139e62b1195/myuser 318M 13.1G 318M /home/myuser
rpool/USERDATA/root_1b9910ca-f889-4b64-8942-a139e62b1195 796K 13.1G 480K /root
Install tang
I followed the guide by RedHat to install and setup tang.
[root@tang:~]# dnf install -y tang
[root@tang:~]# semanage port -a -t tangd_port_t -p tcp 7500
# This should probably be limited to only the local subnet, luckily all servers are currently behind NAT
[root@tang:~]# firewall-cmd --add-port=7500/tcp
[root@tang:~]# firewall-cmd --add-port=7500/tcp --permanent
[root@tang:~]# systemctl enable tangd.socket
# add the override for port 7500 (see the RedHat guide)
[root@tang:~]# systemctl edit tangd.socket
[root@tang:~]# systemctl daemon-reload
[root@tang:~]# systemctl start tangd.socket
[root@tang:~]# /usr/libexec/tangd-keygen /var/db/tang
# We need to save this for later
[root@tang:~]# tang-show-keys 7500
sN8bs7tkHqdKQii2DNmqYz6nluQ
Install clevis
[root@clevis:~]# dnf install -y clevis
Setting clevis properties on ZFS dataset
I'm not how its stored when using LUKS, but I am assuming the necessary value (jwe) doesn't need to be encrypted (otherwise you would still need to enter keys manually).
Encrypting the pasword
Verify that we're using the correct password.
[root@clevis:~]# echo -n 'testpass' | zfs load-key -n rpool
1 / 1 key(s) successfully verified
I'm using the IP, since I'm not sure if something like /etc/hosts is available in the initramfs where clevis will be run.
# Using the thumprint we got earlier
[root@clevis:~]# echo -n 'testpass' | clevis encrypt tang '{"url": "http://192.168.122.18:7500", "thp": "sN8bs7tkHqdKQii2DNmqYz6nluQ"}' > password.jwe
Store the JWE as a ZFS property
I'm using ZFS's User Properties[1] to save this value:
# Explicitly specify that we'd like to decrypt this, something like autodecrypt=yes or onboot=yes or when=onboot might be better.
# A property setting an order might also be useful when using multiple pools/datasets e.g. latchset.clevis:priority=0
zfs set latchset.clevis:decrypt=yes rpool
zfs set latchset.clevis:jwe=$(cat password.jwe) rpool
# we should not need to decrypt child datasets unless explicitly specified with `zfs set latchset.clevis:decrypt=yes rpool/some/dataset` and another `latchset.clevis:jwe`
# Therefore we skip inherited values (i.e. only check locally set ones)
[root@clevis:~]# zfs get latchset.clevis:decrypt -s local
NAME PROPERTY VALUE SOURCE
rpool latchset.clevis:decrypt yes local
[root@clevis:~]# zfs get latchset.clevis:jwe -s local
NAME PROPERTY VALUE SOURCE
rpool latchset.clevis:jwe [long JWE string] local
Check if it's correctly stored:
# This currently assumes only one dataset has decrypt=yes set, this should be made more flexible.
[root@clevis:~]# zfs get -H latchset.clevis:decrypt -s local | awk '$3=="yes"{print $1} | xargs zfs list -H -o latchset.clevis:jwe > zfs-out.jwe
# -Z ignores newline at EOF differences
[root@clevis:~]# diff -Z password.jwe zfs-out.jwe && echo 'identical'
identical
Test clevis for decryption
[root@clevis:~]# zfs get -H latchset.clevis:decrypt -s local | awk '$3=="yes"{print $1}' | xargs -I POOLNAME sh -c "zfs list -H -o latchset.clevis:jwe POOLNAME | clevis decrypt | zfs load-key -n POOLNAME"
1 / 1 key(s) successfully verified
Updating initramfs
Add dracut config
Add extra config for dracut
[root@clevis:~]# tail /etc/dracut.conf.d/*
==> /etc/dracut.conf.d/20-network.conf <==
kernel_cmdline=" ip=192.168.122.242 netmask=255.255.255.0 gateway=192.168.122.1 nameserver=192.168.122.1 "
==> /etc/dracut.conf.d/30-clevis.conf <==
add_dracutmodules+=" clevis "
==> /etc/dracut.conf.d/50-zfs.conf <==
add_dracutmodules+=" zfs "
Add network settings to bootloader
[root@clevis:~]# cat /boot/loader/entries/centos.conf
title CentOS 8 ZFS
version zfs-4.18.0-193.14.2.el8_2.x86_64
linux /vmlinuz-4.18.0-193.14.2.el8_2.x86_64
initrd /initramfs-4.18.0-193.14.2.el8_2.x86_64.img
options rd.auto=1 ip=192.168.122.242 netmask=255.255.255.0 gateway=192.168.122.1 nameserver=192.168.122.1 root=ZFS=rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195 rw
Update systemd zfs-load-key-rpool
N.B.: I'm not sure if editing this service this is the correct approach, since the boot process keeps asking me to enter the password by hand. The systemd-cat echo commands also do not appear. I think this file (the original) is auto-generated by systemd/dracut somehow, and I need a way to hook into that.
Run systemctl edit zfs-load-key-rpool.service, and enter the following:
[Service]
ExecStart=
ExecStart=/bin/sh -c 'set -eu;keystatus="$$(/sbin/zfs get -H -o value keystatus "rpool")";[ "$$keystatus" = "unavailable" ] || exit 0; systemd-cat echo '######## trying clevis #########'; /sbin/zfs list -H -o latchset.clevis:jwe rpool | /bin/clevis decrypt | /sbin/zfs load-key rpool && exit 0; systemd-cat echo '###### trying password ######'; c>
This does have the hardcoded pool name ("rpool") set, but that was already the case.
In more readable form:
set -eu;
keystatus="$$(/sbin/zfs get -H -o value keystatus "rpool")";
[ "$$keystatus" = "unavailable" ] || exit 0;
systemd-cat echo '######## trying clevis #########';
/sbin/zfs list -H -o latchset.clevis:jwe rpool | /bin/clevis decrypt | /sbin/zfs load-key rpool)" && exit 0;
systemd-cat echo '###### trying password ######';
count=0;
while [ $$count -lt 3 ];
do systemd-ask-password --id="zfs:rpool" "Enter passphrase for rpool:" | /sbin/zfs load-key "rpool" && exit 0
count=$$((count + 1));
done;
exit 1;
Inspect the old value next to the new overridden value:
[root@clevis:~]# systemctl cat zfs-load-key-rpool.service
Install dracut modules
[root@clevis:~]# dnf install -y clevis-dracut zfs-dracut
Update Initramfs
# grep used for brevity
[root@clevis:~]# dracut -vf |& grep 'module:\|img\|zfs\|clevis'
dracut: zfsexpandknowledge: pool rpool has device /dev/disk/by-partlabel/rpool (which resolves to /dev/vda3)
dracut: zfsexpandknowledge: block devices backing ZFS dataset /: /dev/vda3
dracut: zfsexpandknowledge: host device /dev/vda1
dracut: zfsexpandknowledge: host device /dev/vda3
dracut: zfsexpandknowledge: device /dev/vda of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda3 of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda1 of type vfat
dracut: zfsexpandknowledge: pool rpool has device /dev/disk/by-partlabel/rpool (which resolves to /dev/vda3)
dracut: zfsexpandknowledge: block devices backing ZFS dataset /: /dev/vda3
dracut: zfsexpandknowledge: host device /dev/vda1
dracut: zfsexpandknowledge: host device /dev/vda3
dracut: zfsexpandknowledge: device /dev/vda of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda3 of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda1 of type vfat
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: nss-softokn ***
dracut: *** Including module: rngd ***
dracut: *** Including module: i18n ***
dracut: *** Including module: network-legacy ***
dracut: *** Including module: network ***
dracut: *** Including module: ifcfg ***
dracut: *** Including module: drm ***
dracut: *** Including module: plymouth ***
dracut: *** Including module: clevis ***
dracut: *** Including module: prefixdevname ***
dracut: *** Including module: crypt ***
dracut: *** Including module: dm ***
dracut: *** Including module: kernel-modules ***
dracut: *** Including module: kernel-modules-extra ***
dracut: *** Including module: kernel-network-modules ***
dracut: *** Including module: qemu ***
dracut: *** Including module: zfs ***
dracut: *** Including module: rootfs-block ***
dracut: *** Including module: terminfo ***
dracut: *** Including module: udev-rules ***
dracut: *** Including module: biosdevname ***
dracut: *** Including module: dracut-systemd ***
dracut: *** Including module: usrmount ***
dracut: *** Including module: base ***
dracut: *** Including module: fs-lib ***
dracut: *** Including module: microcode_ctl-fw_dir_override ***
dracut: microcode_ctl module: mangling fw_dir
dracut: *** Including module: shutdown ***
dracut: *** Creating image file '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_64.img' ***
dracut: *** Creating initramfs image file '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_64.img' done ***
Test it
[root@clevis:~]# systemctl reboot
The password prompt still appears, it seems like there is still something missing.
Summary
This is what I know after testing with 2 VM's
What works:
Having access to JWE at boot time
By making the JWE value available at boot time in ZFS metadata: zfs list -o name,latchset.clevis:jwe rpool
Network connection at boot
This is done by adding the kernel_cmdline value (in /etc/dracut.conf.d/20-network.conf to the bootloader's options.
It now responds to pings while waiting for the ZFS passphrase.
Manually booting
By adding rd.break=pre-mount to the bootloader's optionsI am able to boot manually using clevis instead of typing the passphrase myself.
# for some reason rpool is imported without altroot set, so we reimport it to set it.
~# zpool export rpool
~# zpool import rpool -R /sysroot
# Test the key
~# zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key -n rpool
1 / 1 key(s) successfully verified
# load the key
~# zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
# mount /
~# zfs mount rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195
# mount child datasets
~# zfs mount -a
# Booting resumes here
~# systemctl switch-root /sysroot
What not yet works:
I still need to find a way to make sure clevis is actually used within the initramfs, i.e. have it run zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
Possible solutions
Order of dracut modules being loaded
I don't think it matters, as long as both clevis, zfs and the network connection are loaded before zfs load-key rpool is being run.
Missing parameters/configuration
There might be something needed to tell dracut to run:
zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
instead of:
systemd-ask-password --id="zfs:rpool" "Enter passphrase for rpool:" | zfs load-key "rpool"
This might be dracut configuration (i.e. in /etc/dracut.conf.d/) or an extra kernel parameter (i.e. options in /boot/loader/entries/centos.conf).
zfs-mount-generator
It might be possible that zfs-mount-generator is making this harder than it needs to be. Maybe using zfs-mount.service will help?
References
[1]: (see: man zfs | less +"/^ User Properties" or the Oracle Documentation on User Properties which is the same as on Linux) to save these values.
How far did you go into adapting this for the initramfs environment?
I got quite far, I think. Even so far as to spread the clevis data over multiple zfs user properties in case it is to large (there's an 8k limit per property value). And manually unlocking worked just fine IIRC. I still had some trouble with the boot hook tough, but I guess I could give it a go again.
@vogelfreiheit I started to work on this again on #373, help and/or feedback is welcome :slightly_smiling_face:
Had anybody contact with github.com/shatteredsilicon/zfs-clevis/ already?