scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

`n_jobs` support details in docs

Open Alexsandruss opened this issue 8 months ago • 3 comments

Description

Adds a doc page for n_jobs specifics of sklearnex.


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • [x] I have reviewed my changes thoroughly before submitting this pull request.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • [x] Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • [x] I have added a respective label(s) to PR if I have a permission for that.
  • [x] I have resolved any merge conflicts that might occur with the base branch.

Testing

  • [x] I have run it locally and tested the changes extensively.
  • [x] All CI jobs are green or I have provided justification why they aren't.
  • [x] I have extended testing suite if new functionality was introduced in this PR.

Performance

N/A

Alexsandruss avatar Apr 25 '25 16:04 Alexsandruss

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure ?
github 71.96% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more. see 41 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 25 '25 17:04 codecov[bot]

Thanks for adding these explanations. But it's still missing important pieces of information and leaves several questions unanswered:

  • It's missing the threading part of MKL, the static linkage part, and how it interacts with environment variables, n_jobs parameter, inner_max_num_threads parameter, and threadpoolctl configurations.
  • It doesn't mention how the threading works when put under a threadpoolctl context.
  • The explanation is unclear about what ends up happening with the number of threads when using environment variables in addition to passing n_jobs as parameter.
  • Could mention what happens with n_jobs in GPU mode.
  • There's a difference in the threading configuration logic between daal4py and sklearnex, which this doc could also mention.
  • It doesn't cover the part about some configurations being global, which is quite relevant when using python-based multi-threading.
  • It could mention that the TBB threading doesn't automatically avoid nested parallelism when used in conjunction with OpenMP (which sklearn uses) and/or with joblib or python threads.
  • Some estimators perform better when not using all threads - for example, linear regression is faster on LNL laptops when not using low-power E-cores. Perhaps could mention these sort of things here as they are relevant.

david-cortes-intel avatar May 05 '25 08:05 david-cortes-intel

@Alexsandruss make sure to merge main for latest CI checks on docs

icfaust avatar May 26 '25 14:05 icfaust

Closing in favor of https://github.com/uxlfoundation/scikit-learn-intelex/pull/2768

david-cortes-intel avatar Nov 18 '25 15:11 david-cortes-intel