qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

Due to deprecation of file-based pools, some of us need a ZFS qvm-pool backend

Open Rudd-O opened this issue 4 years ago • 15 comments

The problem you're addressing (if any)

Some of us use a file system that is not a stock Fedora file system, and are not willing to trust our data to other file systems. In this particular case, we are talking about ZFS.

So far -- albeit complicated to use -- the file-based pool system has worked fine for us. This is going to get deprecated in Qubes OS 4.1 and will be removed in the next release.

This leaves the abovementioned class of Qubes OS users without a solution that works to back our qubes with disk storage.

The solution you'd like

I would like to see a ZFS-based qvm-pool backend, that preferably uses a tree of ZFS volumes arranged similarly to the LVM arrangement. Cloning of VMs could be done extremely fast using ZFS snapshots and clones, and ephemeral file systems such as the root file system in standard AppVMs can also be supported in the same way. Due to the way ZFS works, it is entirely likely that no loop devices or device-mapper mappings will need to be manipulated at all, which also means a gain in simplicity w.r.t. how the backend is implemented.

(Nota bene: this effort would be greatly helped by documentation on how to implement new storage drivers.)

The value to a user, and who that user might be

The class of users is well-defined, although we understand that we are in a minority.

  • Superior data reliability and trustworthiness.
  • Higher performance, when combining rotational and solid storage devices (ARC / ZIL).
  • Ability to perform snapshot sends and receives.
  • Optionally, automatic snapshots for user-configurable revert of VMs to previous states.
  • Given careful design, certain VMs may require an encryption key (particular to a specific ZFS volume) to boot.
  • TRIM works by default and autotrim can be turned on -- all that is required from the VM itself is that discard is used as mount option. Even in the case when this is not possible to arrange, filling the VM's disks with zeroes still causes an enormous disk space gain due to ZLE compression.

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver -- installable as a package -- that I can test and verify myself, hopefully with the end goal of shepherding it into the main Qubes OS distribution at a later release.

Thanks in advance.

Rudd-O avatar Oct 26 '21 10:10 Rudd-O

documentation on how to implement new storage drivers

Please make a separate issue for this unless one already exists.

DemiMarie avatar Oct 26 '21 11:10 DemiMarie

Will do later.

Rudd-O avatar Oct 26 '21 11:10 Rudd-O

Linking this as this work may be useful:

https://github.com/cfcs/qubes-storage-zfs/issues/2

Rudd-O avatar Oct 26 '21 11:10 Rudd-O

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver

Have you considered funding https://github.com/openzfs/zfs/issues/405 - it would make the file-reflink Qubes storage driver compatible with ZFS.

rustybird avatar Oct 26 '21 12:10 rustybird

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver

Have you considered funding openzfs/zfs#405 - it would make the file-reflink Qubes storage driver compatible with ZFS.

It would also be incredibly useful in other areas, not just Qubes OS.

DemiMarie avatar Oct 26 '21 13:10 DemiMarie

I considered it and rejected the idea at least for now:

  • it's more complex to get that right, hence why it's been sitting in review for a while,
  • I would prefer a more native solution that directly used one ZFS volume per one qvm-pool volume, to make revert of single VMs much faster and simpler than the current reversion process today (clone snapshot, mount clone, delete existing volume files from VM dir, copy old volume files into VM dir, unmount clone, destroy clone).

ZFS volumes are the right abstraction level for this -- what the VM wants are block devices, and what ZFS volumes provide us are block devices. The fact that the file driver currently has to make this absolutely horrible dm-mapper / loopback dance to get things to work is awful. This complexity would go away with a ZFS-specific zvol-using driver.

I agree with the premise that enabling reflink to work in ZFS would be a more fundamental and more generally beneficial change. I don't want the availiability of that feature to block progress on this proposal.

Rudd-O avatar Oct 26 '21 21:10 Rudd-O

But those limitations of the old 'file' driver don't apply to the 'file-reflink' driver. The latter doesn't use the device mapper (only loop devices - implicitly), and volumes can be instantly reverted to any previous revision with qvm-revert.

I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.

I don't want the availiability of that feature to block progress on this proposal.

Fair enough, they're your funds :)

rustybird avatar Oct 27 '21 15:10 rustybird

I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.

There appear to be several reasons for this:

  • zfs.py tries to avoid running much of the code as root, but qubesd already runs with root privileges, so this doesn’t actually provide the intended protection. So all of the sudo zfs allow calls can be removed.
  • zfs.py has a rather verbose coding style (lots of multi-line array literals)
  • Some leftovers from the LVM2 driver

DemiMarie avatar Oct 27 '21 19:10 DemiMarie

The latter doesn't use the device mapper (only loop devices - implicitly),

Correct, this I don't want in a ZFS driver.

and volumes can be instantly reverted to any previous revision with qvm-revert.

Cool to know. A ZFS driver would be able to do this by simply issuing zfs rollback path/to/volume as well.

Rudd-O avatar Oct 28 '21 01:10 Rudd-O

So all of the sudo zfs allow calls can be removed.

Seconded.

Rudd-O avatar Oct 28 '21 01:10 Rudd-O

@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.

tlaurion avatar Jul 21 '22 19:07 tlaurion

Also, it seems (from personal non-exhaustive research) that ZFS might be the only candidate we currently have to have pool-wide volumes reduplication (having qubes back up its advice to clone templates to specialize usage) without having exponential cost of storage with deployed software + in-place upgrades of fedora fast paced (annoying) release cycles.

I try to gather input on pool deduplication here: https://forum.qubes-os.org/t/pool-level-deduplication/12654

Please shed some light if you have any advice.

tlaurion avatar Jul 21 '22 19:07 tlaurion

@Rudd-O you saw https://github.com/QubesOS/qubes-core-admin/pull/289 ?

tlaurion avatar Jul 21 '22 19:07 tlaurion

@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.

I will happily honor my offer if the funds are used to finance the developer time to finish the half-written ZFS pool driver.

Rudd-O avatar Jul 21 '22 23:07 Rudd-O

FYI, reflink support for ZFS also seems to be progressing nicely in PR #13392.

rustybird avatar Jul 22 '22 13:07 rustybird

Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.

ayakael avatar Aug 30 '23 16:08 ayakael

This pool has already been implemented, closing as fixed.

DemiMarie avatar Aug 30 '23 17:08 DemiMarie

Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.

That's good news.

Do note that the ZFS driver being released with Qubes 4.2 lets you use ZFS snapshots to natively take care of your VM storage, supports send + receive, and will take advantage of ZFS encryption (if your system is set up to use it). ZFS stability is, of course, legendary.

I have a backport of the driver for 4.1 here: https://repo.rudd-o.com/q4.1/packages/qubes-core-dom0-4.1.33.1-40.qbs4.1.noarch.rpm . Do note upgrading your 4.1 system will result in this package being erased and you won't have a storage driver anymore.

Rudd-O avatar Sep 04 '23 23:09 Rudd-O