scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

'AttributeError: 'PCA' object has no attribute 'n_oversamples'

Open Gabriel-p opened this issue 1 year ago • 4 comments

Describe the bug

Attempting to run the code below results in an error when sklearnex is combined with PCA. This line produces the error shown below

python -m sklearnex PCA_test.py

this line does not

python PCA_test.py

To Reproduce

Store this code in a PCA_test.py file and call using the commands above

import numpy as np
from sklearn.decomposition import PCA

data = np.random.uniform(-10, 10, (1000, 3))
pca = PCA(n_components=3)
data_pca = pca.fit(data).transform(data)

Expected behavior No error

Output/Screenshots


Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Traceback (most recent call last):
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/site-packages/sklearnex/__main__.py", line 55, in <module>
    sys.exit(_main())
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/site-packages/sklearnex/__main__.py", line 52, in _main
    runf(args.name, run_name='__main__')
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "PCA_test.py", line 9, in <module>
    data_pca = pca.fit(data).transform(data)
  File "/home/gperren/miniconda3/envs/pyupmask/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 402, in fit
    self.n_oversamples,
AttributeError: 'PCA' object has no attribute 'n_oversamples'

Environment:

System:
    python: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0]
executable: /home/gperren/miniconda3/envs/pyupmask/bin/python
   machine: Linux-5.0.16-100.fc28.x86_64-x86_64-with-glibc2.17

Python dependencies:
      sklearn: 1.1.1
          pip: 21.2.4
   setuptools: 61.2.0
        numpy: 1.22.3
        scipy: 1.7.3
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: /home/gperren/miniconda3/envs/pyupmask/lib/python3.8/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
         prefix: libgomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 48

       filepath: /home/gperren/miniconda3/envs/pyupmask/lib/libmkl_rt.so.1
         prefix: libmkl_rt
       user_api: blas
   internal_api: mkl
        version: 2021.4-Product
    num_threads: 24
threading_layer: intel

       filepath: /home/gperren/miniconda3/envs/pyupmask/lib/libiomp5.so
         prefix: libiomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 48

Gabriel-p avatar Jul 08 '22 12:07 Gabriel-p

Seems only reproduce at MacOS

CI - Test

------------------------------- Captured stdout --------------------------------
Command '['/usr/local/miniconda/envs/CB/bin/python', '/Users/runner/work/1/s/daal4py/sklearn/monkeypatch/tests/utils/_launch_algorithms.py']' returned non-zero exit status 1.
------------------------------- Captured stderr --------------------------------
dispatcher.py:151: FutureWarning: 
Scikit-learn patching with daal4py is deprecated and will be removed in the future.
Use Intel(R) Extension for Scikit-learn* module instead (pip install scikit-learn-intelex).
To enable patching, please use one of the following options:
1) From the command line:
    python -m sklearnex <your_script>
2) From your script:
    from sklearnex import patch_sklearn
    patch_sklearn()
Intel(R) oneAPI Data Analytics Library solvers for sklearn enabled: https://intelpython.github.io/daal4py/sklearn.html
Traceback (most recent call last):
  File "/Users/runner/work/1/s/daal4py/sklearn/monkeypatch/tests/utils/_launch_algorithms.py", line 117, in <module>
    run_algotithms()
  File "/Users/runner/work/1/s/daal4py/sklearn/monkeypatch/tests/utils/_launch_algorithms.py", line 93, in run_algotithms
    run_patch(info, t)
  File "/Users/runner/work/1/s/daal4py/sklearn/monkeypatch/tests/utils/_launch_algorithms.py", line 61, in run_patch
    model.fit(X, y)
  File "/usr/local/miniconda/envs/CB/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 402, in fit
    self.n_oversamples,
AttributeError: 'PCA' object has no attribute 'n_oversamples'
=========================== short test summary info ============================
ERROR s/daal4py/sklearn/monkeypatch/tests/test_patching.py - SystemExit: 1
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

FavorMylikes avatar Jul 23 '22 06:07 FavorMylikes

Hi @Gabriel-p @FavorMylikes Thank your for your detailed reports. It seems due to new sklearn version 1.1.1, where we have some new parameters for PCA such as n_oversamples. So it is sklearex bug.

samir-nasibli avatar Jul 23 '22 11:07 samir-nasibli

Running into the same issue, how do we solve it?

mjoy296 avatar Jul 29 '22 14:07 mjoy296

Running into the same issue, how do we solve it?

@mjoy296 Downgrade sklearn to 1.0.2

FavorMylikes avatar Jul 30 '22 09:07 FavorMylikes

@mjoy296, I'm running into the same issue as well.

@FavorMylikes, I appreciate your solution but unfortunately I can't downgrade sklearn to 1.0.2 as I have dependencies on 1.1.

As seen in latest sklearn (1.1.2) docs there are new PCA() parameters since 1.1:

n_oversamples : int, default=10
power_iteration_normalizer : {‘auto’, ‘QR’, ‘LU’, ‘none’}, default=’auto’

Here are evidences:

>>> from sklearnex import patch_sklearn
>>> patch_sklearn()
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
>>> from sklearn.decomposition import PCA
>>> p = PCA()
>>> p.get_params()
{'copy': True, 'iterated_power': 'auto', 'n_components': None, 'random_state': None, 'svd_solver': 'auto', 'tol': 0.0, 'whiten': False}
>>> from sklearnex import unpatch_sklearn
>>> unpatch_sklearn()
>>> from sklearn.decomposition import PCA
>>> p = PCA()
>>> p.get_params()
{'copy': True, 'iterated_power': 'auto', 'n_components': None, 'n_oversamples': 10, 'power_iteration_normalizer': 'auto', 'random_state': None, 'svd_solver': 'auto', 'tol': 0.0, 'whiten': False}

I wonder if sklearnex could support new default PCA() parameters "from the future" or at least ignore their existence, otherwise I'd prefer to sklearnex.unpatch_sklearn() just for PCA() as its performance seems acceptable without sklearnex for now.

Eventually I would seek for more PCA performance by other means like using GPUs with the PCA from RAPIDS/cuml.

mauriciocramos avatar Aug 23 '22 08:08 mauriciocramos