vdev_open: clear async fault flag after reopen
Motivation and Context
After #15839, vdev_fault_wanted is set on a vdev after a probe fails. An end-of-txg async task is charged with actually faulting the vdev.
In a single-disk pool, the probe failure will degrade the last disk, and then suspend the pool. However, vdev_fault_wanted is not cleared. After the pool returns, the transaction finishes and the async task runs and faults the vdev, which suspends the pool again.
Description
The fix is simple: when reopening a vdev, clear the async fault flag. If the vdev is still failed, the startup probe will quickly notice and degrade/suspend it again. If not, all is well!
How Has This Been Tested?
Test case is included. It fails before, and now passes.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [x] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by.
If this can handle the transient USB faults on my USB 3.1 Gen 2 drive cages causing pools to go offline until reboot...
Further testing shows the bug's impact is a little wider: if multiple disks are lost on the same txg causing the pool to suspend, after return they will all re-fault at end of txg, and the pool will fail again. This happens if a disk array or backplane fails, taking out multiple disks in the same moment. Not a hugely big deal, and the fix here takes care of it in the same way.
Merged as 393b7ad6952217a7c0823f705f5b4a41d6b4f3f5 5de3ac223623d5348e491cc89c70a803ddcd7184