ENH: Array API dispatching
Description
Based on #2079
Adding dpctl into conformance testing.
Some development from https://github.com/intel/scikit-learn-intelex/pull/2014
TBD
Checklist to comply with before moving PR from draft:
PR completeness and readability
- [x] I have reviewed my changes thoroughly before submitting this pull request.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [x] I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
- [x] Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
- [x] I have added a respective label(s) to PR if I have a permission for that.
- [x] I have resolved any merge conflicts that might occur with the base branch.
Testing
- [x] I have run it locally and tested the changes extensively.
- [x] All CI jobs are green or I have provided justification why they aren't.
- [x] I have extended testing suite if new functionality was introduced in this PR.
Performance
- [x] I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
- [x] I have provided justification why performance has changed or why changes are not expected.
- [x] I have provided justification why quality metrics have changed or why changes are not expected.
- [x] I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.
⚠️ The sha of the head commit of this PR conflicts with #2079. Mergify cannot evaluate rules on this PR. ⚠️
/intelci: run
/intelci: run
Convert_or_pass is going to cause some problems. This won't show up in any of the CIs as currently configured, but low precision hardware is going to have troubles. Secondly, non dpnp, dpctl inputs are going to have lots of troubles on to_table, as the dlpack interface is not in place. The best way to solve these issues is to 1) fix the dlpack problem, and 2) add the ability to pass a queue into to_table, which then the data conversion should be done in pybind11 than in python (i.e. check the dtypes, see if they don't match the queue's device, convert via copy and create the table).
Codecov Report
Attention: Patch coverage is 79.25532% with 39 lines in your changes missing coverage. Please review.
| Flag | Coverage Δ | |
|---|---|---|
| azure | 79.84% <74.46%> (+0.13%) |
:arrow_up: |
| github | 72.13% <70.96%> (+0.13%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| onedal/_device_offload.py | 81.03% <100.00%> (+2.58%) |
:arrow_up: |
| onedal/utils/_array_api.py | 84.78% <100.00%> (+22.28%) |
:arrow_up: |
| sklearnex/_utils.py | 83.56% <100.00%> (-2.01%) |
:arrow_down: |
| sklearnex/basic_statistics/basic_statistics.py | 89.85% <100.00%> (+0.14%) |
:arrow_up: |
| ...x/basic_statistics/incremental_basic_statistics.py | 96.03% <100.00%> (+0.03%) |
:arrow_up: |
| sklearnex/cluster/dbscan.py | 92.20% <100.00%> (+0.10%) |
:arrow_up: |
| sklearnex/cluster/k_means.py | 89.65% <100.00%> (-0.15%) |
:arrow_down: |
| sklearnex/covariance/incremental_covariance.py | 91.42% <100.00%> (+0.98%) |
:arrow_up: |
| sklearnex/decomposition/pca.py | 90.47% <100.00%> (ø) |
|
| sklearnex/ensemble/_forest.py | 80.40% <100.00%> (+0.03%) |
:arrow_up: |
| ... and 16 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
/intelci: run
@icfaust I get an error when trying to import classes:
NameError: name 'BaseEstimator' is not defined
/intelci: run
/intelci: run
@icfaust I notice the generated links for help buttons now give out invalid URLs. For example, the help link for logistic regression now has this as URL:
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.logistic_regression.LogisticRegression.html
whereas it should have this:
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html
@icfaust I notice the generated links for help buttons now give out invalid URLs. For example, the help link for logistic regression now has this as URL:
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.logistic_regression.LogisticRegression.htmlwhereas it should have this:
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html
Good catch! Suffering from our codebase not following sklearn standards in module naming. I've changed it to use _doc_link_url_param_generator instead.
/intelci: run
/intelci: run
@icfaust I notice the generated links for help buttons now give out invalid URLs. For example, the help link for logistic regression now has this as URL:
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.logistic_regression.LogisticRegression.htmlwhereas it should have this:https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.htmlGood catch! Suffering from our codebase not following sklearn standards in module naming. I've changed it to use
_doc_link_url_param_generatorinstead.
Now it throws:
TypeError: oneDALEstimator._doc_link_url_param_generator() missing 1 required positional argument: '_'
@icfaust It would be quite helpful in all of these changes to have type hints in the function signatures. I realize the current codebase hardly uses them, but this could be a good opportunity to start adding them.
@icfaust By the way, since now we have versioned docs of sklearnex with all the available versions since 2024.7, and we have a __version__ attribute for sklearnex, the extension estimators could now offer a versioned doc link instead of always pointing to 'latest'.
@icfaust By the way, since now we have versioned docs of sklearnex with all the available versions since 2024.7, and we have a
__version__attribute for sklearnex, the extension estimators could now offer a versioned doc link instead of always pointing to 'latest'.
It would be straightforward to add, except for getting the logic to know when to use latest as that may be dependent on changes coming down the line in #2413. What do you think about making the change there since we are fundamentally changing how we label dev builds?
@icfaust By the way, since now we have versioned docs of sklearnex with all the available versions since 2024.7, and we have a
__version__attribute for sklearnex, the extension estimators could now offer a versioned doc link instead of always pointing to 'latest'.It would be straightforward to add, except for getting the logic to know when to use
latestas that may be dependent on changes coming down the line in #2413. What do you think about making the change there since we are fundamentally changing how we label dev builds?
Yes, that'd be fine too.
@icfaust It would be quite helpful in all of these changes to have type hints in the function signatures. I realize the current codebase hardly uses them, but this could be a good opportunity to start adding them.
I've done the type additions for all of sklearnex._device_offload and sklearnex.base (the core of the changes in the PR). I will do the other files associated with sklearnex.utils.array_api and onedal._device_offload, etc. as follow-ups as the changes their are minimal. I will make sure that we start type enforcing with the rollout of the zero copy estimators, which will be an easy sign of their zero copy support vs how the codebase is now, and it will make sense to do the typing for the estimators with the estimator changes.
@icfaust A perhaps better workaround for the type hints in old python versions is to have them as strings - e.g. "bool | None" instead of bool | None would be accepted by python3.9.
/intelci: run
@icfaust This throws errors when executed in a jupyter-compatible environment:
from sklearnex.linear_model import IncrementalLinearRegression
from sklearn.base import ClassifierMixin
class CustomLinearRegression(ClassifierMixin, IncrementalLinearRegression):
pass
import numpy as np
rng = np.random.default_rng(seed=123)
X = rng.standard_normal(size=(10,3))
y = rng.standard_normal(size=X.shape[0])
CustomLinearRegression().fit(X, y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/mambaforge/envs/pipsklearn/lib/python3.11/site-packages/IPython/core/formatters.py:347, in BaseFormatter.__call__(self, obj)
345 method = get_real_method(obj, self.print_method)
346 if method is not None:
--> 347 return method()
348 return None
349 else:
File ~/mambaforge/envs/pipsklearn/lib/python3.11/site-packages/sklearn/base.py:693, in BaseEstimator._repr_html_inner(self)
688 def _repr_html_inner(self):
689 """This function is returned by the @property `_repr_html_` to make
690 `hasattr(estimator, "_repr_html_") return `True` or `False` depending
691 on `get_config()["display"]`.
692 """
--> 693 return estimator_html_repr(self)
File ~/mambaforge/envs/pipsklearn/lib/python3.11/site-packages/sklearn/utils/_estimator_html_repr.py:391, in estimator_html_repr(estimator)
380 html_template = (
381 f"<style>{style_with_id}</style>"
382 f'<div id="{container_id}" class="sk-top-container">'
(...)
386 '<div class="sk-container" hidden>'
387 )
389 out.write(html_template)
--> 391 _write_estimator_html(
392 out,
393 estimator,
394 estimator.__class__.__name__,
395 estimator_str,
396 first_call=True,
397 is_fitted_css_class=is_fitted_css_class,
398 is_fitted_icon=is_fitted_icon,
399 )
400 out.write("</div></div>")
402 html_output = out.getvalue()
File ~/mambaforge/envs/pipsklearn/lib/python3.11/site-packages/sklearn/utils/_estimator_html_repr.py:259, in _write_estimator_html(out, estimator, estimator_label, estimator_label_details, is_fitted_css_class, is_fitted_icon, first_call)
257 # `estimator` can also be an instance of `_VisualBlock`
258 if hasattr(estimator, "_get_doc_link"):
--> 259 doc_link = estimator._get_doc_link()
260 else:
261 doc_link = ""
File ~/repos/scikit-learn-intelex/sklearnex/base.py:86, in oneDALEstimator._get_doc_link(self)
79 # The next object in the Estimators MRO after oneDALEstimator should be
80 # the equivalent sklearn estimator, if it is BaseEstimator, it is a
81 # sklearnex-only estimator.
82 if (
83 oneDALEstimator in mro
84 and mro[mro.index(oneDALEstimator) + 1] is BaseEstimator
85 ):
---> 86 module_path, _ = self.__class__.__module__.rsplit(".", 1)
87 class_name = self.__class__.__name__
88 url = f"https://uxlfoundation.github.io/scikit-learn-intelex/latest/non-scikit-algorithms.html#{module_path}.{class_name}"
ValueError: not enough values to unpack (expected 2, got 1)
What's more, the error is thrown twice.
@icfaust The last commit fixed the issue for the doclinks, but some things are still not working correctly when subclassing. For example, this doesn't work:
from sklearnex.svm import SVR
from sklearn.base import ClassifierMixin
class CustomEstimator(ClassifierMixin, SVR):
pass
CustomEstimator().fit(X, y)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[7], line 6
4 class CustomLinearRegression(ClassifierMixin, SVR):
5 pass
----> 6 CustomLinearRegression().fit(X, y)
File ~/repos/scikit-learn-intelex/daal4py/sklearn/_n_jobs_support.py:111, in _run_with_n_jobs.<locals>.n_jobs_wrapper(self, *args, **kwargs)
108 if n_jobs is None or n_jobs == 0:
109 if n_threads is None:
110 # default branch with no setting for n_jobs
--> 111 return method(self, *args, **kwargs)
112 else:
113 n_jobs = n_threads
File ~/repos/scikit-learn-intelex/sklearnex/svm/svr.py:81, in SVR.fit(self, X, y, sample_weight)
70 elif self.C <= 0:
71 # else if added to correct issues with
72 # sklearn tests:
(...)
78 # Windows fatal exception: access violation
79 # occurs
80 raise ValueError("C <= 0")
---> 81 dispatch(
82 self,
83 "fit",
84 {
85 "onedal": self.__class__._onedal_fit,
86 "sklearn": _sklearn_SVR.fit,
87 },
88 X,
89 y,
90 sample_weight=sample_weight,
91 )
93 return self
File ~/repos/scikit-learn-intelex/sklearnex/_device_offload.py:164, in dispatch(obj, method_name, branches, *args, **kwargs)
162 queue = QM.get_global_queue()
163 patching_status.write_log(queue=queue, transferred_to_host=False)
--> 164 return branches["onedal"](obj, *hostargs, **hostkwargs, queue=queue)
165 else:
166 if sklearn_array_api and not has_usm_data:
167 # dpnp fallback is not handled properly yet.
File ~/repos/scikit-learn-intelex/sklearnex/svm/svr.py:124, in SVR._onedal_fit(self, X, y, sample_weight, queue)
123 def _onedal_fit(self, X, y, sample_weight=None, queue=None):
--> 124 X, _, sample_weight = self._onedal_fit_checks(X, y, sample_weight)
125 onedal_params = {
126 "C": self.C,
127 "epsilon": self.epsilon,
(...)
135 "max_iter": self.max_iter,
136 }
138 self._onedal_estimator = onedal_SVR(**onedal_params)
File ~/repos/scikit-learn-intelex/sklearnex/svm/_common.py:172, in BaseSVM._onedal_fit_checks(self, X, y, sample_weight)
163 X, y = validate_data(
164 self,
165 X,
(...)
169 accept_sparse="csr",
170 )
171 y = self._validate_targets(y)
--> 172 sample_weight = self._get_sample_weight(X, y, sample_weight)
173 return X, y, sample_weight
File ~/repos/scikit-learn-intelex/sklearnex/svm/_common.py:196, in BaseSVM._get_sample_weight(self, X, y, sample_weight)
187 raise ValueError(
188 "sample_weight and X have incompatible shapes: "
189 "%r vs %r\n"
(...)
192 % (len(sample_weight), X.shape)
193 )
195 if sample_weight_count == 0:
--> 196 if not isinstance(self, ClassifierMixin) or self.class_weight_ is None:
197 return None
198 sample_weight = np.ones(n_samples, dtype=dtype)
AttributeError: 'CustomLinearRegression' object has no attribute 'class_weight_'
Not sure if it broke here or if was broken before though.
@icfaust The last commit fixed the issue for the doclinks, but some things are still not working correctly when subclassing. For example, this doesn't work:
from sklearnex.svm import SVR from sklearn.base import ClassifierMixin class CustomEstimator(ClassifierMixin, SVR): pass CustomEstimator().fit(X, y)--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[7], line 6 4 class CustomLinearRegression(ClassifierMixin, SVR): 5 pass ----> 6 CustomLinearRegression().fit(X, y) File ~/repos/scikit-learn-intelex/daal4py/sklearn/_n_jobs_support.py:111, in _run_with_n_jobs.<locals>.n_jobs_wrapper(self, *args, **kwargs) 108 if n_jobs is None or n_jobs == 0: 109 if n_threads is None: 110 # default branch with no setting for n_jobs --> 111 return method(self, *args, **kwargs) 112 else: 113 n_jobs = n_threads File ~/repos/scikit-learn-intelex/sklearnex/svm/svr.py:81, in SVR.fit(self, X, y, sample_weight) 70 elif self.C <= 0: 71 # else if added to correct issues with 72 # sklearn tests: (...) 78 # Windows fatal exception: access violation 79 # occurs 80 raise ValueError("C <= 0") ---> 81 dispatch( 82 self, 83 "fit", 84 { 85 "onedal": self.__class__._onedal_fit, 86 "sklearn": _sklearn_SVR.fit, 87 }, 88 X, 89 y, 90 sample_weight=sample_weight, 91 ) 93 return self File ~/repos/scikit-learn-intelex/sklearnex/_device_offload.py:164, in dispatch(obj, method_name, branches, *args, **kwargs) 162 queue = QM.get_global_queue() 163 patching_status.write_log(queue=queue, transferred_to_host=False) --> 164 return branches["onedal"](obj, *hostargs, **hostkwargs, queue=queue) 165 else: 166 if sklearn_array_api and not has_usm_data: 167 # dpnp fallback is not handled properly yet. File ~/repos/scikit-learn-intelex/sklearnex/svm/svr.py:124, in SVR._onedal_fit(self, X, y, sample_weight, queue) 123 def _onedal_fit(self, X, y, sample_weight=None, queue=None): --> 124 X, _, sample_weight = self._onedal_fit_checks(X, y, sample_weight) 125 onedal_params = { 126 "C": self.C, 127 "epsilon": self.epsilon, (...) 135 "max_iter": self.max_iter, 136 } 138 self._onedal_estimator = onedal_SVR(**onedal_params) File ~/repos/scikit-learn-intelex/sklearnex/svm/_common.py:172, in BaseSVM._onedal_fit_checks(self, X, y, sample_weight) 163 X, y = validate_data( 164 self, 165 X, (...) 169 accept_sparse="csr", 170 ) 171 y = self._validate_targets(y) --> 172 sample_weight = self._get_sample_weight(X, y, sample_weight) 173 return X, y, sample_weight File ~/repos/scikit-learn-intelex/sklearnex/svm/_common.py:196, in BaseSVM._get_sample_weight(self, X, y, sample_weight) 187 raise ValueError( 188 "sample_weight and X have incompatible shapes: " 189 "%r vs %r\n" (...) 192 % (len(sample_weight), X.shape) 193 ) 195 if sample_weight_count == 0: --> 196 if not isinstance(self, ClassifierMixin) or self.class_weight_ is None: 197 return None 198 sample_weight = np.ones(n_samples, dtype=dtype) AttributeError: 'CustomLinearRegression' object has no attribute 'class_weight_'Not sure if it broke here or if was broken before though.
This is a SVM-specific check that is breaking and is unrelated to the changes in the PR. Not that the common.py design is good, but the check for ClassifierMixin is to allow for SVC and NuSVC-specific code in BaseSVM to be run while still in BaseSVM.
The CI error doesn't appear to be related to the changes here:
ImportError: libsvml.so: cannot open shared object file: No such file or directory
/intelci: run
/intelci: run
/intelci: run
http://intel-ci.intel.com/f01b91e9-a110-f123-b04f-a4bf010d0e2d
/intelci: run