Due to deprecation of file-based pools, some of us need a ZFS qvm-pool backend
The problem you're addressing (if any)
Some of us use a file system that is not a stock Fedora file system, and are not willing to trust our data to other file systems. In this particular case, we are talking about ZFS.
So far -- albeit complicated to use -- the file-based pool system has worked fine for us. This is going to get deprecated in Qubes OS 4.1 and will be removed in the next release.
This leaves the abovementioned class of Qubes OS users without a solution that works to back our qubes with disk storage.
The solution you'd like
I would like to see a ZFS-based qvm-pool backend, that preferably uses a tree of ZFS volumes arranged similarly to the LVM arrangement. Cloning of VMs could be done extremely fast using ZFS snapshots and clones, and ephemeral file systems such as the root file system in standard AppVMs can also be supported in the same way. Due to the way ZFS works, it is entirely likely that no loop devices or device-mapper mappings will need to be manipulated at all, which also means a gain in simplicity w.r.t. how the backend is implemented.
(Nota bene: this effort would be greatly helped by documentation on how to implement new storage drivers.)
The value to a user, and who that user might be
The class of users is well-defined, although we understand that we are in a minority.
- Superior data reliability and trustworthiness.
- Higher performance, when combining rotational and solid storage devices (ARC / ZIL).
- Ability to perform snapshot sends and receives.
- Optionally, automatic snapshots for user-configurable revert of VMs to previous states.
- Given careful design, certain VMs may require an encryption key (particular to a specific ZFS volume) to boot.
- TRIM works by default and autotrim can be turned on -- all that is required from the VM itself is that
discardis used as mount option. Even in the case when this is not possible to arrange, filling the VM's disks with zeroes still causes an enormous disk space gain due to ZLE compression.
I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver -- installable as a package -- that I can test and verify myself, hopefully with the end goal of shepherding it into the main Qubes OS distribution at a later release.
Thanks in advance.
documentation on how to implement new storage drivers
Please make a separate issue for this unless one already exists.
Will do later.
Linking this as this work may be useful:
https://github.com/cfcs/qubes-storage-zfs/issues/2
I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver
Have you considered funding https://github.com/openzfs/zfs/issues/405 - it would make the file-reflink Qubes storage driver compatible with ZFS.
I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver
Have you considered funding openzfs/zfs#405 - it would make the file-reflink Qubes storage driver compatible with ZFS.
It would also be incredibly useful in other areas, not just Qubes OS.
I considered it and rejected the idea at least for now:
- it's more complex to get that right, hence why it's been sitting in review for a while,
- I would prefer a more native solution that directly used one ZFS volume per one qvm-pool volume, to make revert of single VMs much faster and simpler than the current reversion process today (clone snapshot, mount clone, delete existing volume files from VM dir, copy old volume files into VM dir, unmount clone, destroy clone).
ZFS volumes are the right abstraction level for this -- what the VM wants are block devices, and what ZFS volumes provide us are block devices. The fact that the file driver currently has to make this absolutely horrible dm-mapper / loopback dance to get things to work is awful. This complexity would go away with a ZFS-specific zvol-using driver.
I agree with the premise that enabling reflink to work in ZFS would be a more fundamental and more generally beneficial change. I don't want the availiability of that feature to block progress on this proposal.
But those limitations of the old 'file' driver don't apply to the 'file-reflink' driver. The latter doesn't use the device mapper (only loop devices - implicitly), and volumes can be instantly reverted to any previous revision with qvm-revert.
I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.
I don't want the availiability of that feature to block progress on this proposal.
Fair enough, they're your funds :)
I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.
There appear to be several reasons for this:
zfs.pytries to avoid running much of the code as root, but qubesd already runs with root privileges, so this doesn’t actually provide the intended protection. So all of thesudo zfs allowcalls can be removed.zfs.pyhas a rather verbose coding style (lots of multi-line array literals)- Some leftovers from the LVM2 driver
The latter doesn't use the device mapper (only loop devices - implicitly),
Correct, this I don't want in a ZFS driver.
and volumes can be instantly reverted to any previous revision with qvm-revert.
Cool to know. A ZFS driver would be able to do this by simply issuing zfs rollback path/to/volume as well.
So all of the sudo zfs allow calls can be removed.
Seconded.
@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.
Also, it seems (from personal non-exhaustive research) that ZFS might be the only candidate we currently have to have pool-wide volumes reduplication (having qubes back up its advice to clone templates to specialize usage) without having exponential cost of storage with deployed software + in-place upgrades of fedora fast paced (annoying) release cycles.
I try to gather input on pool deduplication here: https://forum.qubes-os.org/t/pool-level-deduplication/12654
Please shed some light if you have any advice.
@Rudd-O you saw https://github.com/QubesOS/qubes-core-admin/pull/289 ?
@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.
I will happily honor my offer if the funds are used to finance the developer time to finish the half-written ZFS pool driver.
FYI, reflink support for ZFS also seems to be progressing nicely in PR #13392.
Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.
This pool has already been implemented, closing as fixed.
Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.
That's good news.
Do note that the ZFS driver being released with Qubes 4.2 lets you use ZFS snapshots to natively take care of your VM storage, supports send + receive, and will take advantage of ZFS encryption (if your system is set up to use it). ZFS stability is, of course, legendary.
I have a backport of the driver for 4.1 here: https://repo.rudd-o.com/q4.1/packages/qubes-core-dom0-4.1.33.1-40.qbs4.1.noarch.rpm . Do note upgrading your 4.1 system will result in this package being erased and you won't have a storage driver anymore.