zdb: cleanup and improve cachefile handling
Motivation and Context
- Add missing cleanup for early return.
- Automatically detect cache file; otherwise, force import.
Description
- When encountering an error or early return,
kernel_finiis not called. - If a pool is created with the cache file located in a non-default path
/etc/default/zpool.cache, removed, or the cachefile property is set to none, zdb fails to show the pool unless we specify the cache file or use the-eoption. This PR automates this process.
How Has This Been Tested?
CI Tests and tested with the following five scenarios. Without this patch, only scenario 1 works without specifying additional arguments:
- Default cache file path
/etc/zfs/zpool.cache, and the file exists. - Default cache file path
/etc/zfs/zpool.cache, and the file does not exist. - Custom cache file path
/data/zfs/zpool.cache, and the file exists. - Custom cache file path
/data/zfs/zpool.cache, and the file does not exist. - Cachefile property set to
none.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [x] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
Checklist:
- [x] My code follows the OpenZFS code style requirements.
- [ ] I have updated the documentation accordingly.
- [x] I have read the contributing document.
- [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [x] All commit messages are properly formatted and contain
Signed-off-by.
When cachefile property is set to none, do [-d] or [-O] options require the -e option? or is that automated as well?
When cachefile property is set to none, do [-d] or [-O] options require the -e option? or is that automated as well?
@akashb-22 - Considering libzfs is able to read the cachefile property correctly, it should be automated.
I see some disparity when I use the pool-name and dataset name as target.
# zpool get cachefile
NAME PROPERTY VALUE SOURCE
pool-oss5 cachefile none local
# zdb -d pool-oss5
Dataset mos [META], ID 0, cr_txg 4, 5.62M, 171 objects
Dataset pool-oss5/ost5 [ZPL], ID 387, cr_txg 67, 1.20G, 7 objects
Dataset pool-oss5 [ZPL], ID 54, cr_txg 1, 1024K, 6 objects
Verified large_blocks feature refcount of 1 is correct
Verified large_dnode feature refcount of 1 is correct
Verified sha512 feature refcount of 0 is correct
Verified skein feature refcount of 0 is correct
Verified edonr feature refcount of 0 is correct
Verified userobj_accounting feature refcount of 1 is correct
Verified encryption feature refcount of 0 is correct
Verified project_quota feature refcount of 1 is correct
Verified redaction_bookmarks feature refcount of 0 is correct
Verified redacted_datasets feature refcount of 0 is correct
Verified bookmark_written feature refcount of 0 is correct
Verified livelist feature refcount of 0 is correct
Verified zstd_compress feature refcount of 0 is correct
Verified zilsaxattr feature refcount of 1 is correct
Verified blake3 feature refcount of 0 is correct
Verified redaction_list_spill feature refcount of 0 is correct
Verified device_removal feature refcount of 0 is correct
Verified indirect_refcount feature refcount of 0 is correct
# zdb -d pool-oss5/ost5
failed to hold dataset 'pool-oss5/ost5': No such file or directory
zdb: can't open 'pool-oss5/ost5': No such file or directory
# zdb -d pool-oss5/ost5 -e
Dataset pool-oss5/ost5 [ZPL], ID 387, cr_txg 67, 1.20G, 7 objects
I think the dataset as the target doesn't go well here maybe, I'm not sure, the same with -e seems to be working fine. @ixhamza is this expected?
@akashb-22 - I have made some adjustments to automate the scenario you mentioned. Can you give it a try now?
@behlendorf - I have updated the patch according to your suggestions and rebased it with master branch. The alloc_class tests, which were previously failing, are now passing. However, the remaining failing test cases seem to be unrelated.