scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

[enhancement] remove sklearn_check_version dependence from `onedal/svm`

Open icfaust opened this issue 9 months ago • 2 comments

Description

Add a design rule which forces all sklearn-related interfacing to occur in its proper package, namely sklearnex. This prevents bad estimator architectures which place too much of the sklearn-related aspects in the wrong area. Ideally this test would have no exceptions going forward.

This is a subsegment of #1829 which handles just the ensemble algorithm so that sklearn_check_version is not in onedal/svm

This also solves the PR #1727 by removing sklearn inheritance from onedal/svm classes

This also solves issues with gpu offloading with fit_proba that had been removed errantly in #1361

This also fixes an issue where sample_weights are always generated, which may impact performance.

Tasks -

  • [ ] Initial compile
  • [ ] correct public CI failures
  • [ ] Split into separate PRs as necessary (and merge them)
  • [ ] pass private CI

icfaust avatar May 14 '24 10:05 icfaust

/intelci: run

icfaust avatar May 14 '24 21:05 icfaust

/intelci: run

icfaust avatar May 15 '24 20:05 icfaust

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

samir-nasibli avatar May 17 '24 02:05 samir-nasibli

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py

Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes).

If we want to remove sklearn entirely, we need to fix issues in (not counting testing): onedal/cluster/kmeans.py onedal/cluster/dbscan.py (solved in #1837) onedal/ensemble/forest.py onedal/common/_mixin.py onedal/utils/validation.py (This one would be difficult)

icfaust avatar May 17 '24 05:05 icfaust

/intelci: run

icfaust avatar May 17 '24 06:05 icfaust

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py

Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes).

If we want to remove sklearn entirely, we need to fix issues in (not counting testing): onedal/cluster/kmeans.py onedal/cluster/dbscan.py (solved in #1837) onedal/ensemble/forest.py onedal/common/_mixin.py onedal/utils/validation.py (This one would be difficult)

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

onedal4py already depends on daal4py, and it is actually part of current daal4py package. sklearn is a runtime dependency, we are versioning it on daal4py, so why we can not do it on onedal4py? Probably make sense design onedal4py as a independent, but currently it is not sklearn-independent. And since for some primitives it is used in onedal4py and sklearn provide different ifaces/deprecate smth based on version, so make sense also having versioning there in my opinion for a while.

samir-nasibli avatar May 17 '24 09:05 samir-nasibli

/intelci: run

icfaust avatar May 17 '24 10:05 icfaust

/intelci: run

icfaust avatar May 22 '24 07:05 icfaust

@icfaust onedal4py depends on sklearn. Could you please clarify more why we can not use sklearn versioning here?

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise. Secondly, sklearn_check_version is used only for input verification and conformance on raising errors in sklearn (in practice in onedal/). By design, sklearn conformance should be done in the sklearnex directory, and not in the onedal directory. Why would a user care about sklearn if they were using onedal classes directly? For example, why else have these functions defined at all? https://github.com/intel/scikit-learn-intelex/blob/main/onedal/utils/validation.py Thirdly, operational maintenance of the onedal folder should be minimized (i.e. . By using sklearn_check_version, it is increasing our burden in observation and knowledge into the various onedal classes). If we want to remove sklearn entirely, we need to fix issues in (not counting testing): onedal/cluster/kmeans.py onedal/cluster/dbscan.py (solved in #1837) onedal/ensemble/forest.py onedal/common/_mixin.py onedal/utils/validation.py (This one would be difficult)

The onedal folder installs via setup.py, requirements for the setup.py do not include a sklearn version (as per INSTALL.md). Therefore, if the user were install onedal without having run setup_sklearnex.py, they are in for a nasty surprise.

onedal4py already depends on daal4py, and it is actually part of current daal4py package. sklearn is a runtime dependency, we are versioning it on daal4py, so why we can not do it on onedal4py? Probably make sense design onedal4py as a independent, but currently it is not sklearn-independent. And since for some primitives it is used in onedal4py and sklearn provide different ifaces/deprecate smth based on version, so make sense also having versioning there in my opinion for a while.

There is a key distinction here: if daal4py depends on sklearn versions, then problems with sklearn versioning will be solved in daal4py. If we add sklearn versioning to onedal4py, we increase our maintenance burden. If you list the primitives that you mention I'd like to review, from my perspective sklearn has a stable API for what we use in onedal4py. If this becomes a problem in a future sklearn version, then this test will specifically trigger and highlight a discussion about the architecture which we should have.

icfaust avatar May 22 '24 07:05 icfaust