scikit-learn-intelex
scikit-learn-intelex copied to clipboard
[testing] Expand sklearnex testing to all methods taking "X" or "y" as input
Description
All testing under the sklearnex/tests
folder use centralized estimator and method lists which were previously limited to methods based off of the various inherited sklearn Mixins. This tested all 'core' functionality like 'fit', 'predict', 'transform', etc. which are directly patched out by oneDAL functions. Some next level methods (those which use these core functionality with a small number of additional calculations) like score
were included. However, this list was incomplete, as some estimators (like PCA and KNeighbors) implement special additional methods like inverse_transform
and kneighbors_graph
, which require special inputs and/or outputs. Up until now, these methods have not been tested using the centralized testing suite (and may have not been tested at all). This PR dynamically checks and collects all methods to all patched estimators which contain "X" or "y" in the input into testing, increasing flexibility and reducing maintenance.
The additional code comes from having to fix issues related to taking in dpnp and dptcl inputs since they were previously untested.
Rather than using _device_offload.dispatch
which is code heavy, cumbersome and with significant overhead, a limited use of get_namespace
was added to these failing methods which provide the necessary pre- or post-processing of 'core' oneDAL methods.
With these additions, dpnp and dpctl inputs are almost fully covered in the codebase. One deselection was made in radius_neighbors
and radius_neighbors_graph
in the NearestNeighbors algorithm, as these would require significant work (i.e. the implementation of a new estimator) in order to properly support dpnp/dpctl inputs (unless we knowingly only allow CPU computation).
The listed changes are as follows:
- rename ensemble algorithms
check_sample_weight
method to_check_sample_weight
to make it a private method - Change IncrementalEmpiricalCovariance
mahalanobis
andscore
methods to accept dpnp/dpctl inputs (score
was previously missed as this estimator has no Mixins) - Provide error to NearestNeighbors algorithm for use of
radius_neighbors
andradius_neighbors_graph
for non-numpy/pandas inputs, which are unsupported. - Remove radius_neighbors from KNeighborsRegressor and KNeighorsClassifier, as they do not support it from sklearn, and it was purely using sklearn functionality
- Fix
kneighbors_graph
method ofNearestNeighbors
,KNeighborsRegressor
, andKNeighborsClassifier
to take in dpnp/dpctl inputs, and return a scipy.csr_matrix (removed wrap output data, etc.) - Disable
score_samples
andscore
methods in stability testing for PCA (will add tickets to address these issues). - Introduce PCA.inverse_transform using get_namespace to insure components_ and mean_ are on the same device as X
NOTE: current private CI fails for PCA in the inverse_transform
method in dpnp testing, this is due to an out-of-date version of dpnp, and is currently getting addressed. Private CI will be re-run as soon as it is updated.