zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Fix hangs with parallel pool import

Open asomers opened this issue 1 year ago • 3 comments

Motivation and Context

The parallel pool import feature (#16093) can greatly increase the number of threads used by /sbin/zpool . If it reaches the OS-imposed per-process limit and then tries to import an additional pool, the command will hang. This PR fixes such hangs.

Description

The PR consists of four distinct commits:

  • Ensure that tpool_dispatch returns an error if it cannot start at least 1 worker. Previously it would silently fail.
  • If tpool_dispatch fails while mounting file systems, make libzfs fall back to serial execution. This fixes the bug, though with inconsistent performance.
  • Allow /sbin/zpool to control the threadpool size used by libzfs when mounting file systems. This improves the performance of the solution in the previous commit. However, it changes the libzfs ABI.
  • Add an ATF test for tpool_dispatch to ensure correct behavior if it cannot start even one worker. Due to lack of test infrastructure within the OpenZFS repository, this commit is not actually included in this PR. Instead, I plan to commit it directly to FreeBSD.

How Has This Been Tested?

In addition to the aforementioned libtpool test, I have tested this change by importing 6 pools in parallel, each with 1501 file systems. Before my changes, that would fail on 7 out of 8 attempts. Now it always succeeds, with consistent performance.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [x] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

Fixes #16172

asomers avatar May 08 '24 19:05 asomers

@behlendorf if it's ok to change the libzfs abi, could you please tell me how to update the libzfs.abi file? It looks like it's automatically generated somehow.

asomers avatar May 09 '24 15:05 asomers

@asomers you can download an updated abi file from the artifacts for this PR (at the bottom). You'll just need to replace the current abi file in this PR with the new once after inspecting the differences, then force update the PR.

https://github.com/openzfs/zfs/actions/runs/9007873783?pr=16178

behlendorf avatar May 09 '24 15:05 behlendorf

@asomers you can download an updated abi file from the artifacts for this PR (at the bottom). You'll just need to replace the current abi file in this PR with the new once after inspecting the differences, then force update the PR.

Ahh, that's easy. Done!

asomers avatar May 09 '24 15:05 asomers

Now that this PR has been merged downstream into FreeBSD, here is a review for the libtpool ATF test. https://reviews.freebsd.org/D45587

asomers avatar Jun 13 '24 22:06 asomers