zfs zdb: cleanup and improve cachefile handling

Motivation and Context

Add missing cleanup for early return.
Automatically detect cache file; otherwise, force import.

Description

When encountering an error or early return, kernel_fini is not called.
If a pool is created with the cache file located in a non-default path /etc/default/zpool.cache, removed, or the cachefile property is set to none, zdb fails to show the pool unless we specify the cache file or use the -e option. This PR automates this process.

How Has This Been Tested?

CI Tests and tested with the following five scenarios. Without this patch, only scenario 1 works without specifying additional arguments:

Default cache file path /etc/zfs/zpool.cache, and the file exists.
Default cache file path /etc/zfs/zpool.cache, and the file does not exist.
Custom cache file path /data/zfs/zpool.cache, and the file exists.
Custom cache file path /data/zfs/zpool.cache, and the file does not exist.
Cachefile property set to none.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Performance enhancement (non-breaking change which improves efficiency)
[x] Code cleanup (non-breaking change which makes code smaller or more readable)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
[ ] Documentation (a change to man pages or other documentation)

Checklist:

[x] My code follows the OpenZFS code style requirements.
[ ] I have updated the documentation accordingly.
[x] I have read the contributing document.
[ ] I have added tests to cover my changes.
[ ] I have run the ZFS Test Suite with this change applied.
[x] All commit messages are properly formatted and contain Signed-off-by.

Apr 08 '24 22:04 ixhamza

When cachefile property is set to none, do [-d] or [-O] options require the -e option? or is that automated as well?

Apr 16 '24 16:04 akashb-22

When cachefile property is set to none, do [-d] or [-O] options require the -e option? or is that automated as well?

@akashb-22 - Considering libzfs is able to read the cachefile property correctly, it should be automated.

Apr 16 '24 17:04 ixhamza

I see some disparity when I use the pool-name and dataset name as target.

# zpool get cachefile
NAME       PROPERTY   VALUE      SOURCE
pool-oss5  cachefile  none       local

# zdb -d pool-oss5
Dataset mos [META], ID 0, cr_txg 4, 5.62M, 171 objects
Dataset pool-oss5/ost5 [ZPL], ID 387, cr_txg 67, 1.20G, 7 objects
Dataset pool-oss5 [ZPL], ID 54, cr_txg 1, 1024K, 6 objects
Verified large_blocks feature refcount of 1 is correct
Verified large_dnode feature refcount of 1 is correct
Verified sha512 feature refcount of 0 is correct
Verified skein feature refcount of 0 is correct
Verified edonr feature refcount of 0 is correct
Verified userobj_accounting feature refcount of 1 is correct
Verified encryption feature refcount of 0 is correct
Verified project_quota feature refcount of 1 is correct
Verified redaction_bookmarks feature refcount of 0 is correct
Verified redacted_datasets feature refcount of 0 is correct
Verified bookmark_written feature refcount of 0 is correct
Verified livelist feature refcount of 0 is correct
Verified zstd_compress feature refcount of 0 is correct
Verified zilsaxattr feature refcount of 1 is correct
Verified blake3 feature refcount of 0 is correct
Verified redaction_list_spill feature refcount of 0 is correct
Verified device_removal feature refcount of 0 is correct
Verified indirect_refcount feature refcount of 0 is correct

# zdb -d pool-oss5/ost5
failed to hold dataset 'pool-oss5/ost5': No such file or directory
zdb: can't open 'pool-oss5/ost5': No such file or directory

# zdb -d pool-oss5/ost5 -e
Dataset pool-oss5/ost5 [ZPL], ID 387, cr_txg 67, 1.20G, 7 objects

I think the dataset as the target doesn't go well here maybe, I'm not sure, the same with -e seems to be working fine. @ixhamza is this expected?

Apr 16 '24 19:04 akashb-22

@akashb-22 - I have made some adjustments to automate the scenario you mentioned. Can you give it a try now?

Apr 16 '24 21:04 ixhamza

@behlendorf - I have updated the patch according to your suggestions and rebased it with master branch. The alloc_class tests, which were previously failing, are now passing. However, the remaining failing test cases seem to be unrelated.

May 29 '24 15:05 ixhamza