zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Place ZIL blocks on special device when available

Open sgsunder opened this issue 1 year ago • 4 comments

Describe the feature would like to see added to OpenZFS

Per zpoolconcepts.7, the intent log is allocated from blocks within the main pool, in the absence of a separate log (SLOG) device.

Request is for the ZIL blocks to be allocated to the special vdevs first, then spillover to the primary vdevs if needed, as the special vdevs are usually faster than the primary vdevs.

Ideally, this should be done completely automatically and without user configuration once this feature is enabled on the pool.

How will this feature improve OpenZFS?

Users often use special devices in hybrid pool setups, with primary vdevs being composed of large, slower HDDs and the special vdevs composed of fast flash SSDs. In such cases, putting the ZIL on the special device can give many of the same benefits as using an SLOG device, without having to actually use a SLOG device.

The special vdev is already being used for multiple purposes: holding metadata, the de-duplication table, and small data blocks. It is also frequently underutilized (on my system, only ~20G of 500G is used on the special vdev). It makes more sense to add another function to this vdev than to get extra hardware for an SLOG.

Additional context

Following from discussion in https://github.com/openzfs/zfs/discussions/15880

https://github.com/openzfs/zfs/issues/14085 is a similar feature request, however I feel like this request is sufficiently smaller in scope as to warrant a different issue. In this request, the ZIL block allocation is done automatically on the special device, in the same way it is done right now on the primary vdevs. This request is also not concerned with L2ARC.

sgsunder avatar Feb 11 '24 15:02 sgsunder

You don't need a large size for slog devices. Usually, they don't grow beyond the maximum dirty data allowance. You can use partitions for slog instead of using whole drives.

jxdking avatar Feb 12 '24 13:02 jxdking

@jxdking Sure, but wouldn't this give even more justification for using the special device for the ZIL?

Searching online, I've seen SLOG devices typically needing at most 16 GB. I'm not sure you can even find modern SSDs that small, so it hardly feels justifiable to add a whole new SSD (or ideally a mirrored pair of SSDs) which costs money and eats up PCIe lanes, just for it to be mostly underutilized.

I've thought about breaking up my special device SSDs into partitions but:

  1. If you already have a special device on your pool, you would have to remove it, repartition the drives, then add the special device partitions back to the pool, then do a lengthy zfs send | zfs recv process. Plus, there seems to be some subtleties with this process that I don't quite understand.
  2. You lose out on the simplicity of having ZFS manage the partition table on your device.
  3. The partitioning will be fixed, instead of ZFS dynamically allocating blocks for the ZIL.

sgsunder avatar Feb 12 '24 15:02 sgsunder

While this request has its merits, I worry that it even more blurs the concept of special vdev. As you have mentioned, special vdev already mixes concepts of small block storage for use with DRAID, etc with low-latency read for metadata and uncached DDT with high write-bandwidth of cached DDT. Addition ZIL there would also mean low-latency and possibly high-bandwidth writes. There are definitely devices now that could perfectly store metadata, but unacceptable for ZIL in any way. It would be good if user could specify what roles the special vdev should play. I remember it was discussed back when it was planned, but I don't remember where it ended up.

amotin avatar Feb 12 '24 16:02 amotin

I'd much rather see

zpool create pool log,special mirror dev1 dev1 data mirror dev3,dev4

Special should then become an alias for metadata,smallfiles, and data a default.

That is, allow the user to specify what they want on each set of disks.

clhedrick avatar Feb 19 '24 18:02 clhedrick