qubes-core-admin icon indicating copy to clipboard operation
qubes-core-admin copied to clipboard

(WiP) zfs pool draft

Open cfcs opened this issue 6 years ago • 39 comments

This PR aims to introduce two new qvm-pool pool modules with this interface:

[user@dom0 ~]$ qvm-pool  --help-drivers
DRIVER         OPTIONS
zfs_encrypted  zpool_name, ask_password_domain, unload_timeout
zfs_zvol       block_devices
  • zfs_zvol which implements VM storage as ZFS zvol volumes. In the current implementation it also deals with zpool creation on each block_devices parameter since that was the easiest way to get started from a clean slate when testing. If there are valid use-case arguments for splitting that off to a third module that could be done (i.e. if anyone thinks that it would be useful to implement a zvol pool on top of an existing dataset).
  • zfs_encryption which largely inherits from zfs_zvol, but on top of encrypted datasets inside zpool_name that acts as "encryption key groups." The keys are optionally unloaded after a pool inactivity timeout (unload_timeout). The keys are loaded by passing a raw file descriptor to an invocation of zfs load-key {dataset} from QRexec service qubes.AskPassword in a target domain.

When used, this results in datasets and zvols in this form:

$ zfs list -o name,type,encryption
qbs/encryption/qbsdev                          filesystem  aes-256-gcm
qbs/encryption/qbsdev/import                   filesystem  aes-256-gcm
qbs/encryption/qbsdev/import/zfsdev            filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm                       filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev                filesystem  aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/private        volume      aes-256-gcm
qbs/encryption/qbsdev/vm/zfsdev/volatile       volume      aes-256-gcm
qbs/vm                                         filesystem  none
qbs/vm/plainbrowser                            filesystem  none
qbs/vm/plainbrowser/private                    volume      none
qbs/vm/plainbrowser/volatile                   volume      none

Adding those was a matter of:

$ qvm-pool -a qbs zfs_zvol -o 'block_devices=/dev/loopMYFILE'
$ qvm-pool -a qbsdev zfs_encrypted -o 'zpool_name=qbs'

For the purpose of key entry I am currently using this implementation of /etc/qubes-rpc/qubes.AskPassword in the ask_password_domain, which should probably be reworked (remember to chmod +x):

#!/bin/sh
read -r -N 100 -s -t 5 subject

# TODO why does this not work, does it need to strip newline?
# might be fixed now, according to @marmarek's suggestion in comment below:
# [ "${subject}" = "${subject//[^-/_a-z:A-Z0-9]}" ] || exit 1

export DISPLAY="${DISPLAY:-:0}"
zenity --forms \
  --text "${QREXEC_REMOTE_DOMAIN}:qubes.AskPassword" \
  --add-password "Password for ${subject//[^a-zA-Z]}" 2>>/tmp/askpw.log

This requires having ZFS installed from zfs-on-linux with the pyzfs3 module built. To get that going (in addition to their instructions) I needed:

sudo qubes-dom0-update kernel-devel git sysstat bc python3-devel
./autogen.sh && \
./configure --with-config=srpm --enable-pyzfs && \
make -j3 pkg-utils rpm-dkms && \
sudo dnf install ./libuutil1-0.8.0-1.qbs4.0.x86_64.rpm     \
                 ./libzfs2-0.8.0-1.qbs4.0.x86_64.rpm       \
                 ./libnvpair1-0.8.0-1.qbs4.0.x86_64.rpm    \
                 ./python3-pyzfs-0.8.0-1.qbs4.0.noarch.rpm \
                 ./zfs-0.8.0-1.qbs4.0.x86_64.rpm

# not sure if we need these:
sudo dnf install  ./libzpool2-0.8.0-1.qbs4.0.x86_64 \
                  ./libzfs2-devel-0.8.0-1.qbs4.0.x86_64.rpm

# install DKMS kernel module, this will take a while since it needs
# to compile a module for each dom0 kernel currently installed.
# may want to bring down the number of kernels:
#    rpm -qa kernel
#    sudo dnf remove kernel-4.14.74-1.pvops.qubes.x86_64
# pay attention that we want the "noarch" and not the "src" rpm:
sudo dnf install ./zfs-dkms-0.8.0-1.qbs4.0.noarch.rpm

This is merely a draft, as witnessed by the many TODO comments, but it does "work," and I thought it might be useful to place here for review, and in case someone wants to collaborate on this with me. NOTABLY:

  • all the volume _import functions are probably horribly broken
  • the size reporting does not seem 100% accurate / implemented for all cases
  • currently no snapshotting takes place, even though there is some scaffolding in place for that
  • the "default" options for zpools/zvols/datasets are not uniformly applied
  • the suggestions for updates to the documentation are probably not 100% correct either

ping:

  • @Rudd-O who also seems to dabble in Qubes + ZFS
  • @tykling who provided invaluable help deciphering ZFS options, and might have further suggestions
  • @marmarek who provided a ton of help and suggestions re: the inner workings of Qubes and qubes-core-admin

cfcs avatar Oct 28 '19 10:10 cfcs

# TODO why does this not work, does it need to strip newline?
# [ "${subject}" = "${subject}//[^-/_a-z:A-Z0-9]}" ] || exit 1

Extra } in the middle.

marmarek avatar Oct 28 '19 11:10 marmarek

Thank you for the review, I will try to get around to addressing this at some point and keep this updated.

This PR is relevant to the discussion at https://github.com/QubesOS/qubes-issues/issues/1293 (I forgot to mention that in the brief).

cfcs avatar Oct 31 '19 17:10 cfcs

I'm extremely interested in this. Do you need funding to get this to pass tests and work? How much would you consider to pique your interest?

Rudd-O avatar Jan 05 '20 04:01 Rudd-O

@Rudd-O I am working on this on my own time (and am using this patch set as-is on my laptop), but if you are able to eventually allocate funds to the Qubes project towards @marmarek's time spent reviewing and answering questions about the Qubes APIs, that would make me feel less guilty for taking up his time.

We also hit some sore spots re: the qvm-pool command-line interface that it would be great to have someone clean up.

cfcs avatar Jan 06 '20 02:01 cfcs

@marmarek would you like to discuss funding for this specific bug?

Rudd-O avatar Jan 06 '20 21:01 Rudd-O

(Lack of this is blocking me getting Qubes Network Server ported to Qubes 4.1 for a variety of reasons that would be super hard to explain shortly.)

Rudd-O avatar Jan 06 '20 21:01 Rudd-O

I'm fine with reviewing it whenever you give info it's ready for the next round of review.

If you like to help financially, contribute to our OpenCollective, so we can fund more people offloading other tasks from me.

marmarek avatar Jan 06 '20 23:01 marmarek

Review comment here. The structure of the datasets is not quite what it should be. According to the inter-distro working group on getting ZFS on root file systems, the structure should be more or less like this:

rpool/ROOT/qubes <- the OS goes here, following the pattern
              root pool name, ROOT, OS name (ubuntu, fedora)
rpool/BOOT/qubes <- if you'll use /boot in ZFS, not sure its status
rpool/USERDATA/VMs <- this can be named whatever you want, doesn't even
              have to be on rpool, but a good default would
              obviously be called VMs or vms or vm
rpool/USERDATA/VMs/plainbrowser
rpool/USERDATA/VMs/plainbrowser/private
rpool/USERDATA/VMs/plainbrowser/volatile

The spec appears to be what will be followed by Fedora as well.

Ref: https://docs.google.com/document/d/1m9VWOjfmdbujV4AQWzWruiXYgb0W4Hu0D8uQC_BWKjk/edit

Rudd-O avatar Jan 10 '20 06:01 Rudd-O

@marmarek 1) how do I contribute to the OpenCollective? 2) how do I earmark contributions for this project?

Rudd-O avatar Jan 10 '20 06:01 Rudd-O

@Rudd-O to clarify, which datasets do you propose prefixing with /USERDATA/ ? The encrypted ones, when adding to an existing pool?

EDIT: Did you link to the right document? This one seems to be an unpublished draft detailing zfs-on-root for Ubuntu systems (neither of which applies here), and there's no mention of an inter-distro working group?

EDIT 2: I didn't have a use-case for it, but adding an unencrypted wrapper in addition to the encrypted one, and allowing both to take arguments like prefix=/USERDATA,zpool=rpool should be easy enough if that would work for you?

cfcs avatar Jan 10 '20 14:01 cfcs

Not sure I understand. Why do we need unencrypted anything?

Rudd-O avatar Jan 11 '20 11:01 Rudd-O

@marmarek 1) how do I contribute to the OpenCollective?

https://opencollective.com/qubes-os

  1. how do I earmark contributions for this project?

You can add a comment when you donate, but read also this: https://www.qubes-os.org/donate/#do-donors-get-personalized-support-or-special-feature-requests

marmarek avatar Jan 11 '20 12:01 marmarek

@Rudd-O I'm assuming that people are using full-disk encryption, so additional encryption keys to enter for e.g. your VM templates or your jazz music collection might be overkill UX-wise.

cfcs avatar Jan 11 '20 17:01 cfcs

That was my assumption too — FDE or no game.  I know people want to have filesystem-level encryption, but that discloses too much — even the names of the datasets are not okay for adversaries to know.

Rudd-O avatar Jan 13 '20 10:01 Rudd-O

https://www.qubes-os.org/donate/#do-donors-get-personalized-support-or-special-feature-requests

Huge bummer.  I would have loved to earmark the donation to this specific feature.  Lemme know when the policy changes so I can negotiate what the developers need to get this through.

Rudd-O avatar Jan 13 '20 10:01 Rudd-O

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

cfcs avatar Mar 03 '21 13:03 cfcs

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

DemiMarie avatar Mar 03 '21 20:03 DemiMarie

@DemiMarie Thank you for your review. I've since split the code out into its own package as advised by @marmarek, so this particular PR is abandoned. I've used this module as the primary pool driver on my personal laptop and have been quite happy with it.

You’re welcome! Even if you don’t intend to get this merged, it might be useful to have an open PR for code review purposes.

You can find the revised code at: https://github.com/cfcs/qubes-storage-zfs

Would you be interested in submitting it as a contrib package? I understand that it isn’t ready for prime time yet, but if/when that changes, this would allow users to obtain it via qubes-dom0-update. This would most likely require adding the DKMS package first, if I understand correctly.

That is the hope, that it can end up as a contrib package.

Like I said I've been using it on my personal Qubes installations since I posted this, and it's worked pretty well for me, but there are still some gotchas that need to be hashed out, and I'd like to have it better documented as well.

I'm kind of worried about releasing it as a contrib package and then having people unable to boot their VMs due to some kind of DKMS issue --- right now, if something goes wrong, all of qubesd stops working, which means file copy to/from dom0 also stops working, and copy-paste, so it can be a grueling experience to have to recover from that situation, even if there's an upstream fix. I'd like to find a solution to that particular problem before encouraging other people to use it for anything but test systems.

Twice I had to boot from a recovery disk because upstream API changes broke qubesd to the point where nothing worked. Copying changed code from a development vm to dom0 without mounting the file system turns out to not be easy when qubesd is broken. I've also had to manually edit the kernel header files because they either didn't compile (???) or were incompatible with the ZFS DKMS modules (that's to be expected I guess). So not quite ready for the primetime yet.

I have actually had to manually edit stuff in dom0 before (mostly qubes.xml). So I understand! I wonder if the compilation errors are due to dom0 having an old version of GCC.

Note sure, but compiler and header files being out of sync does seem like a probable reason. It's been a while since I had anything as drastic as that happen though.

One other question: what are the advantages of this over the BTRFS pool? RAID-Z is the only one that comes to my mind, but there could easily be others I am missing.

Main feature over BTRFS is encryption, so your incremental backups are encrypted at-rest and their integrity can be validated on the backup system without loading the key. Main drawback compared to BTRFS is that ZFS lacks a COW file cloning system call on Linux (BTRFS has that). Not really of big consequence for this particular codebase, since most of the magic happens with zvols which do support cow cloning, but that's my impression of the overall state -- otherwise they're fairly similar. The ability to import ZFS data on FreeBSD systems is also an appealing factor.

cfcs avatar Mar 24 '21 09:03 cfcs

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

zpool live deduplication is a big advantage over brtfs from what I've been reading. Please feel free to shed some lights in the ongoing discussion of pool deduplication, here: https://forum.qubes-os.org/t/pool-level-deduplication/12654

tlaurion avatar Jul 21 '22 19:07 tlaurion

https://github.com/cfcs/qubes-storage-zfs

If this were released as a contrib package with the to-do items checked off such that I can trust my own systems with it, I would honor my promise of the bounty I offered before in #7009.

Rudd-O avatar Jul 21 '22 23:07 Rudd-O

@tlaurion (EDIT)

@cfcs : was the effort abandoned? Have you seen #7009 bounty offer for exactly this work/from @Rudd-O ?

Ah, no, I had not seen that.

In the sense that I use it every day on my own machines, the project is not abandoned, but it hasn't seen meaningful progress for a while now. This PR is effectively abandoned, since I deemed the approach @DemiMarie suggested of a plugin/contrib package better than merging it into qubes-core-admin (see @Rudd-O 's link to my package); but the contrib package does work and I update it when I run into things that annoy me.

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers. There is also a fork by @ayakael here https://github.com/ayakael/qubes-storage-zfs that seems to do something with qubes-builder. I don't think they ever sent any pull requests, but I've picked up some of the patches and it's probably worth having a look at their changes if you're going to dig into this project.

It feels to me like there's some way to go before this would be something I'd recommend to people who are not comfortable fixing DKMS build errors; I've personally had a number of issues with the kernel headers/kernel being out of sync with the fc32 DKMS packages for ZFS. Basically you need to plan 1-2 days of downtime whenever you type qubes-dom0-update, and I don't have a good solution to that problem.

To sum up there are some things to deal with before I think it makes sense to offer this to non-expert users:

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).
  • Fixing TODOs documented in the README (probably not so hard).
  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.
  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

I'd be happy to review and merge patches to my repo, and I'd also be fine with someone forking it into their own contrib package, copying what they like and deleting the rest. As I've explained before, packaging is not really my strong side, so I doubt I will be doing any of that on my own in the near future. I basically did a lot of the easy lifting on this project to achieve what I wanted, and now I have a system that works for me, but I left the heavy lifting of polishing it up and making it mainstream-deployable as an exercise to the reader. I would be delighted to find such a reader here. :-)

cfcs avatar Aug 11 '22 19:08 cfcs

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

DemiMarie avatar Aug 11 '22 22:08 DemiMarie

@tlaurion I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

I would like to collaborate with ya'll to get this in a good state, so don't hesitate to push tasks my way.

By the way, I push my built packages to the following repo: https://repo.gpg.nz/qubes/r4.1. I can't make any guarantees, thouhg.

ayakael avatar Aug 12 '22 17:08 ayakael

@DemiMarie

Judging by the helpful issue posts on that repo, @DemiMarie has also taken it for a spin, but I don't know of any other users / testers.

I actually haven’t. All of my suggestions have come from manual review of the code.

Ah, okay! Well, thanks for doing that!

  • Figuring out a stable update story that doesn't cause massive pain on a daily basis (hard).

The best approach I can think of is to ship a DKMS package that accompanies this package. Fedora 32 is EOL so I am not surprised you are having problem with its DKMS package.

Yes, iirc I'm using fc34 or fc35, but the real issue is that kernel and kernel-sources and the DKMS modules are not updated in lock-step, and it would probably go a long way to have binary ZFS module packages instead of relying on compiling C code in each user's dom0.

  • Reading up on the Qubes 4.0 -> 4.x changelogs and figuring out how the APIs are intended to be used. At the moment there's a number of things that can go wrong, and when they do go wrong, you need to manually stop&start qubesd. Potentially this can be solved by restructuring the code to do the "dangerous" stuff in places where qubesd presently expects exceptions; potentially this will require adding some try..except to qubes-core-admin. I felt figuring this out on my own without pestering the upstream Qubes devs was a hard task.

This is something that definitely needs to be worked on upstream. Writing a third-party storage driver should not be so difficult. I myself had a LOT of problems adding ephemeral volume encryption and ensuring that storage is cleaned up at system boot, so better documentation/cleaner APIs/etc would be amazing.

Feel free to pester me with any questions you have, so that I can improve the documentation.

Thank you! I will take you up on that offer next time I nerdsnipe myself into working on this project.

  • Finding a solution to some of the other problems, like the one where the COW updates to the template's storage (that are thrown away on next reboot anyway) are written to the template's storage instead of the ZFS storage. This is probably also hard without spending time with upstream devs, or a lot of time reading through the code and documentation.

This is also a problem with all of the other storage drivers, sadly.

That is great though, it means hopefully a solution can be found external to the people interested in ZFS integration?

@ayakael

I'm in about the same place as @cfcs as I use this driver every day, but havn't yet taken the time to clean it up. I did make a spec file, which you'll find in my fork. I did enough work to integrate building the zfs drivers within my qubes-builder setup, and I also build an updated version of zfs and zfs-dkms that hooks into my build pipeline. While I'm a package maintainer on other platforms, my knowledge of RPM isn't quite where it needs to be for me to be comfortable pushing anything upstream. I would like to find time to clean this up, though.

Ah, that is cool. I think the best solution in terms of absolutely avoiding DKMS build errors would be to ship a custom dom0 kernel package that includes a ZFS module (potentially as a loadable module so it's not running if not needed). Is that what you do with your qubes-builder setup?

Thanks for chiming in, I was happy to learn that I have a user! :-)

cfcs avatar Aug 14 '22 17:08 cfcs

@cfcs Indeed, the most fullproof solution is including zfs as a kernel module. That was my initial approach, but it was a lot of work maintaining a custom kernel, thus I moved to DKMS a while ago. My current appraoch is actually hooking in backported versions of zfs and zfs-dkms into qubes-builder which has avoided any DKMS issues. Thus, I do not rely on the out-of-date packages from Fedora's fc32 repo, but rather my own repo where the built zfs / zfs-dkms spec files (which were taken from up-to-date Fedora repos) are hosted.

ayakael avatar Aug 14 '22 19:08 ayakael

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

Rudd-O avatar Aug 21 '22 00:08 Rudd-O

@cfcs: I would actually like to see this merged into Qubes core at some point, assuming @marmarek is okay with it. Even if we have problems shipping ZFS, I suspect that it should be possible to perform testing in CI.

DemiMarie avatar Aug 21 '22 14:08 DemiMarie

@DemiMarie seconded. If CI needs to run integration tests, the zfs and zpool commands can totally be mocked with actual output from a system that had ZFS installed.

Rudd-O avatar Aug 22 '22 03:08 Rudd-O

Honestly I don't mind if this project doesn't deliver ZFS as part of the deliverables. I compile ZFS on my dom0s every time there is an update to either the kernel or ZFS (I run ZFS directly from master).

It wouldn't be that much work for Qubes to have a backported zfs and zfs-dkms package available for the current dom0 repo. I suspect most of the errors @cfcs encounters are due to having to adapt the old zfs-dkms package from Fedora's EOL repo to current kernel versions shipped by Qubes.

ayakael avatar Aug 22 '22 03:08 ayakael

@ayakael what they're saying here is that they don't want to use the zfs-dkms package due to compilation in dom0 being necessary. I use the zfs-dkms package I roll from pristine upstream sources and that works very well for me.

Rudd-O avatar Aug 22 '22 16:08 Rudd-O