Fix "Detach spare vdev in case if resilvering does not happen"
Spare vdev should detach from the pool when a disk is reinserted. However, spare detachment depends on the completion of resilvering, and if resilver does not schedule, the spare vdev keeps attached to the pool until the next resilvering. When a zfs pool contains several disks (e.g. 25+ mirror), resilvering does not always happen when a disk is reinserted. In this patch, spare vdev is manually detached from the pool when resilvering does not occur and it has been tested on both Linux and FreeBSD.
How Has This Been Tested?
Created a pool with 25 data mirrors and two spare vdevs. Detached and reattached several data disks to verify spare detaches when the disk comes back online.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by.
@amotin, @tonyhutter, @asomers - This patch works for both Linux and FreeBSD since both zed and zfsd use the same IOCTL to make the disk online.
Did you test with a 25-disk mirror, or a pool with 25 top-level vdevs, each of which was a two-way mirror? And why doesn't resilvering happen when the original disk gets reinserted? That sounds to me like a bug.
Thanks, @asomers for your response. To clarify, this is easily reproducible on 25 top-level vdevs, each of which has a two-way mirror. I am not very deep in that part of the code but here are the details of why resilvering is avoided. range_tree_is_empty(vd->vdev_dtl[DTL_MISSING]) returns empty for reinserted vdev (https://github.com/openzfs/zfs/blob/master/module/zfs/vdev.c#L3479), which is called from vdev_open(), which is actually called from vdev_online() context. If we have fewer disks, resilvering always happens, i.e., range_tree_space(vd->vdev_dtl[DTL_MISSING]) always has some value.
So there's no resilvering because during the time that the disk was missing, no data was written to that TLV? I guess that makes sense. And this patch causes the ZFS kernel module to automatically detach the spare if it determines that resilvering is not necessary? That sounds good.
Thanks. That's the intention of this patch. When zpool_vdev_online() is called from zed/zfsd, we now manually detach the spare vdev if resilvering is not necessary.