zfs
zfs copied to clipboard
zfs special devices dedicated spares
Describe the feature would like to see added to OpenZFS
I have looked for quit some time in documentation but I can't found a way to dedicate a spare to a special device.
My case is the following : I'm using HDDs in 3 vdev for main storage and 4 SSDs in two mirrored (2x2) special device (for small files and metadata). I have set spares for the zpool as the same capacity as HDD drive to replace them. But I can't manage to create a spare dedicated for the special drive as an SSD to be sure to replace them with a proper SSD and not an HDD. Is that possible to actually do or is it planed or should be a feature ?
How will this feature improve OpenZFS?
Making sure that special device as SSD won't be replaced with HDD.
Additional context
N/A
That's a good idea for a feature and something we should look in to since this functionality currently doesn't exist today. The HDD spares will be used as spares for the dedicated special devices.
Ok that's what I thought. For now, I'm not declaring the HDD and SSD as spares. They are plugged, and running (not used in the zpool) and I'll just issue the manual command to replace the failing device by the device I will specially chose. The only downside is that the failing drives are not automatically replaced.
Thanks for the information ! Hopping to see this feature in the future !
I may be wrong, but IIRC previously at least on FreeBSD spare disks never replaced L2ARCs or SLOGs, only normal vdevs. I'd expect the same logic to be extended to special vdevs. So I'd guess it should be safe now, except there is probably no such thing spare special vdevs now.
It could be great to designate spare(s) to a zpool, or a vdev/special or directly to a single drive. Like that you could achieve great control on what will replace your failing drives.
- You could either don't bother as all your drive in your zpool have the same size, by just configuring the spare(s) to be allocated for this given zpool. In a multiple zpool it can be perfect.
- But you could also have a zpool with multiple vdev composed of drives of different size and just want the spare(s) to be allocated to the same size drives by designated the spare to be allocated to a given vdev.
- And finally you could just allocate a spare to a given drive.
This should work exactly the same with special drive.
Pinging the subject to get some attention on this feature.
There doesn't seem agreement on the desired behavior - should there be one spare pool for all types of vdevs, or per-type spare pools? The deluxe approach would be tagging each spare with the types of vdevs they're eligible for, but is it worth adding that complexity when the resulting matches are inevitably compromises?
Is there a way to add flexibility with hooks and scripts? That way the administrator could implement their own replacement policies to match their reality and preferences.
Cheers -- perry
There doesn't seem agreement on the desired behavior - should there be one spare pool for all types of vdevs, or per-type spare pools? The deluxe approach would be tagging each spare with the types of vdevs they're eligible for, but is it worth adding that complexity when the resulting matches are inevitably compromises?
I believe the first step to answering "What should be done" is to get detailed information on what's currently being done for auto-replacement. The smarter the current process is, the less adjustment it would need to be able to make good decisions on what spare to use. If we can make the process smart enough, we probably won't need to create a way to allocate spares to specific VDEVs.
Could someone please supply details on the auto-replacement algorithm?
I also would like to have this feature added.
From what I've found on reddit, it seems that hot spares are only for data vdev. So there is no risk of a failed metadata or zil vdev drive being replaced by a hot spare.
While not having any hot spare for ZIL isn't a big issue, losing metadata would be catastrophic. People tend to recommend to have 3 disk mirror for metadata vdev if the data vdev is raidz2, and 4 disk mirror metadata vdev if data vdev is raidz3.
Of course, this is from social media, we need to validate facts from assumption.
"Assumption is the mother of all fuck up."
SSDs and HDDs must have very different reliability characteristics. I am not sure it makes sense to exactly replicate the level of redundancy between normal and special vdev(s), but sure, some care should be taken. Though if you have two special vdevs, ZFS will try to store different copies of metadata on different vdevs, that should make one of special vdevs failure painful, but not absolutely catastrophic. ZFS now allows to import pool without some top-level vdev(s) to try evacuate the data.
I.e. instead of 4-wide mirror two 2-wide mirrors for special may be not much less reliable, but twice bigger and faster.
Clearly individual requirements vary in that space, since it's (obviously) a money/resilience tradeoff. The original issue was "why won't the system use my perfectly good, big-enough spare device when a special vdev disk fails?" Special vdevs are like data disks in their reliability requirements, I (for one) would be happy to take the temporary performance hit in exchange for not being out of redundancy for my pool.
I was commenting earlier that there doesn't seem general agreement of what the "obviously right" behavior is, though. What is the minimum amount of flexibility (and minimum amount of change to zfs) that can satisfy most of this space? Is there already a callout (script invocation) informing me of a disk failure, so I can script the requisite "zpool replace" command automatically? If there is (and the interface is rich enough), perhaps that preferable to arguing over the one best approach to using spares?
Cheers -- perry
It's worth mentioning that an alternative to to "special spares" could be to implement write-though for special devices (https://github.com/openzfs/zfs/issues/15118).
For my raidz2 array, I have a 3-way mirror SSD special vdev and 1 "hot spare" SSD in the box connected, but just not configured in ZFS. Sure I will get email once any of the 3 drives failed, but I will need immediately replace the drive from the web interface which might not be possible sometimes.
It's not something we cannot live without, but it would be really nice if we can add a hotspare to special vdev and let it replace the failed drive just like the data drive.