Allow lockless zpool status
Motivation and Context
Allow zpool status to work even when the pool is locked up. This is a reworked version of https://github.com/openzfs/zfs/pull/17193 that uses an environment variable instead of command line flags.
Description
Add a new ZPOOL_LOCK_BEHAVIOR envvar to control zpool status lock behavior. ZPOOL_LOCK_BEHAVIOR can have one of these values:
lockless: Try for a short amount of time to get the spa_namespace lock. If that doesn't work, then do the zpool status locklessly. This is dangerous and can crash your system if the pools configs are being modified while zpool status is running. This setting requires zpool status to be run as root.
trylock: Try for a short amount of time to get the spa_namespace lock. If that doesn't work then simply abort zpool status.
wait: Wait forever for the lock. This is the default.
These options allow users to view the zpool status when the pool gets stuck while holding the spa_namespace lock.
How Has This Been Tested?
Added test case
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [ ] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [ ] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain
Signed-off-by.
@tonyhutter btw did you consider pre-rendering pool + vdevs status into a data structure that would be tolerant to lockless reading?
we could do simple A/B swapping of pointer to two buffers, for example - and update the inactive one on every occasion a value presented in that buffer changes, perhaps it makes sense to do it from a per-pool thread woken up for on such an occasion from the site tending to the actual change of pool status
Given that there's no proper testing of the codebase under load currently...
This - for now - somewhat hidden feature opens up a possible slippery slope, since there's obviously a lot of demand for lockless status info gathering - I fear that once this lands, it'll eventually be made easier to use, after recommending it to a few users here and there, and before we know it, it gets integrated in performance monitoring/management stacks... and that will turn this whole thing into an emergency, which doesn't guarantee the best design will be picked, again, just that the emergency will be resolved in more acceptable way. I think since there's pretty intense demand, maybe it's worth doing properly once and be done with it forever.
did you consider pre-rendering pool + vdevs status into a data structure that would be tolerant to lockless reading?
I'm not sure how that would actually be implemented. zpool status does three different ioctls to gather its information, so it would be messy. It might also return old, misleading, data, like the pool is ONLINE when it's not.
fear that once this lands, it'll eventually be made easier to use, after recommending it to a few users here and there, and before we know it, it gets integrated in performance monitoring/management stacks...
Monitoring scripts should be using ZPOOL_LOCK_BEHAVIOR=trylock, which is 100% safe, and will not hang the monitoring script if the pool gets hosed. We trust the user to know that ZPOOL_LOCK_BEHAVIOR=lockless could be a footgun.
I do have ideas in the back of my head for how you could do away with the spa_namespace lock altogether. All the ideas are very heavy lifts though, and may not pan out, so this PR is more of a stop-gap.
Reviewers - I just rebased this 2 days ago. Please take another look when you get a chance.