zfs
zfs copied to clipboard
Place ZIL blocks on special device when available
Describe the feature would like to see added to OpenZFS
Per zpoolconcepts.7, the intent log is allocated from blocks within the main pool, in the absence of a separate log (SLOG) device.
Request is for the ZIL blocks to be allocated to the special vdevs first, then spillover to the primary vdevs if needed, as the special vdevs are usually faster than the primary vdevs.
Ideally, this should be done completely automatically and without user configuration once this feature is enabled on the pool.
How will this feature improve OpenZFS?
Users often use special devices in hybrid pool setups, with primary vdevs being composed of large, slower HDDs and the special vdevs composed of fast flash SSDs. In such cases, putting the ZIL on the special device can give many of the same benefits as using an SLOG device, without having to actually use a SLOG device.
The special vdev is already being used for multiple purposes: holding metadata, the de-duplication table, and small data blocks. It is also frequently underutilized (on my system, only ~20G of 500G is used on the special vdev). It makes more sense to add another function to this vdev than to get extra hardware for an SLOG.
Additional context
Following from discussion in https://github.com/openzfs/zfs/discussions/15880
https://github.com/openzfs/zfs/issues/14085 is a similar feature request, however I feel like this request is sufficiently smaller in scope as to warrant a different issue. In this request, the ZIL block allocation is done automatically on the special device, in the same way it is done right now on the primary vdevs. This request is also not concerned with L2ARC.
You don't need a large size for slog devices. Usually, they don't grow beyond the maximum dirty data allowance. You can use partitions for slog instead of using whole drives.
@jxdking Sure, but wouldn't this give even more justification for using the special device for the ZIL?
Searching online, I've seen SLOG devices typically needing at most 16 GB. I'm not sure you can even find modern SSDs that small, so it hardly feels justifiable to add a whole new SSD (or ideally a mirrored pair of SSDs) which costs money and eats up PCIe lanes, just for it to be mostly underutilized.
I've thought about breaking up my special device SSDs into partitions but:
- If you already have a special device on your pool, you would have to remove it, repartition the drives, then add the special device partitions back to the pool, then do a lengthy
zfs send | zfs recv
process. Plus, there seems to be some subtleties with this process that I don't quite understand. - You lose out on the simplicity of having ZFS manage the partition table on your device.
- The partitioning will be fixed, instead of ZFS dynamically allocating blocks for the ZIL.
While this request has its merits, I worry that it even more blurs the concept of special vdev. As you have mentioned, special vdev already mixes concepts of small block storage for use with DRAID, etc with low-latency read for metadata and uncached DDT with high write-bandwidth of cached DDT. Addition ZIL there would also mean low-latency and possibly high-bandwidth writes. There are definitely devices now that could perfectly store metadata, but unacceptable for ZIL in any way. It would be good if user could specify what roles the special vdev should play. I remember it was discussed back when it was planned, but I don't remember where it ended up.
I'd much rather see
zpool create pool log,special mirror dev1 dev1 data mirror dev3,dev4
Special should then become an alias for metadata,smallfiles, and data a default.
That is, allow the user to specify what they want on each set of disks.