tpot
tpot copied to clipboard
Error using Tpot classifier in google colab that shows "No module named 'sklearn.metrics.scorer'"
Hi all,
I have read
1.https://colab.research.google.com/gist/weixuanfu/7e58b6120929a10a53f034cfb2608e85/tpot_dask_check_colab.ipynb#scrollTo=Gz0BsqZki2t0 2. https://github.com/EpistasisLab/tpot/issues/1095
to implement TPOT in google colab.
However, my code get this import error "No module named 'sklearn.metrics.scorer'"
My code
!pip install TPOT
!pip install dask==2.20.0 dask-glm==0.2.0 dask-ml==1.0.0
!pip install tornado==5.0
!pip install distributed==2.2.0
!pip install xgboost==0.90
!pip install fsspec
from dask.distributed import Client
client = Client(processes=False)
import time
from tpot import TPOTClassifier
start = time.time()
# Assign the values outlined to the inputs
number_generations = 4
population_size = 4
offspring_size = 3
scoring_function = 'roc_auc'
# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
offspring_size=offspring_size, scoring=scoring_function,
verbosity=2, random_state=0,config_dict='TPOT light', cv=5, warm_start=True,use_dask=True)
tpot_clf.fit(X, y)
print(tpot_clf.fitted_pipeline_)
tpot_clf.export('tpot_exported_pipeline.ipyb')
files.download('tpot_exported_pipeline.ipyb')
end = time.time()
print(end - start)
My error
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(*args, **kwargs)
Optimization Progress: 0%
0/16 [00:00<?, ?pipeline/s]
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in _wrapped_cross_val_score(sklearn_pipeline, features, target, cv, scoring_function, sample_weight, groups, use_dask)
424 try:
--> 425 import dask_ml.model_selection # noqa
426 import dask # noqa
12 frames
/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/__init__.py in <module>()
5 """
----> 6 from ._hyperband import HyperbandSearchCV
7 from ._incremental import IncrementalSearchCV
/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/_hyperband.py in <module>()
10
---> 11 from ._incremental import BaseIncrementalSearchCV
12 from ._successive_halving import SuccessiveHalvingSearchCV
/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/_incremental.py in <module>()
15 from sklearn.base import clone
---> 16 from sklearn.metrics.scorer import check_scoring
17 from sklearn.model_selection import ParameterGrid, ParameterSampler
ModuleNotFoundError: No module named 'sklearn.metrics.scorer'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
827 per_generation_function=self._check_periodic_pipeline,
--> 828 log_file=self.log_file_,
829 )
/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function, log_file)
227
--> 228 population[:] = toolbox.evaluate(population)
229
/usr/local/lib/python3.6/dist-packages/tpot/base.py in _evaluate_individuals(self, population, features, target, sample_weight, groups)
1552 for sklearn_pipeline in sklearn_pipeline_list[
-> 1553 chunk_idx : chunk_idx + chunk_size
1554 ]
/usr/local/lib/python3.6/dist-packages/tpot/base.py in <listcomp>(.0)
1551 )
-> 1552 for sklearn_pipeline in sklearn_pipeline_list[
1553 chunk_idx : chunk_idx + chunk_size
/usr/local/lib/python3.6/dist-packages/stopit/utils.py in wrapper(*args, **kwargs)
144 # ``result`` may not be assigned below in case of timeout
--> 145 result = func(*args, **kwargs)
146 return result
/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in _wrapped_cross_val_score(sklearn_pipeline, features, target, cv, scoring_function, sample_weight, groups, use_dask)
429 msg = "'use_dask' requires the optional dask and dask-ml depedencies.\n{}".format(e)
--> 430 raise ImportError(msg)
431
ImportError: 'use_dask' requires the optional dask and dask-ml depedencies.
No module named 'sklearn.metrics.scorer'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-201-63a9e7598fe2> in <module>()
14 verbosity=2, random_state=0,config_dict='TPOT light', cv=5, warm_start=True,use_dask=True)
15
---> 16 tpot_clf.fit(X, y)
17
18 print(tpot_clf.fitted_pipeline_)
/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
861 # raise the exception if it's our last attempt
862 if attempt == (attempts - 1):
--> 863 raise e
864 return self
865
/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
852 self._pbar.close()
853
--> 854 self._update_top_pipeline()
855 self._summary_of_best_pipeline(features, target)
856 # Delete the temporary cache before exiting
/usr/local/lib/python3.6/dist-packages/tpot/base.py in _update_top_pipeline(self)
960 # need raise RuntimeError because no pipeline has been optimized
961 raise RuntimeError(
--> 962 "A pipeline has not yet been optimized. Please call fit() first."
963 )
964
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.
I have also installed the optional dependencies of dask
pip install dask-ml[xgboost] # also install xgboost and dask-xgboost
pip install dask-ml[complete] # install all optional dependencies
https://ml.dask.org/install.html
But it still returns the same error, please advice. Thanks!
This may be because scikit-learn 0.24 is installed, and the most recent update made some breaking changes to the API that your current install of dask-ml will need to address (see #1176).
To fix this, you could update dask-ml (which seems to fix this issue in later versions), or you could add the line (not recommended)
!pip install 'scikit-learn>=0.22.0,<0.24.0' --force-reinstall
after your other pip installs to force an install of an older version of scikit-learn that doesn't have these changes.
EDIT: A previous version of this comment mistakenly attributed the error to TPOT - upon closer reading of this, this is actually an issue with a dask-ml function - you should check your dask-ml version to see if there is an updated one.
Having same issue. tried force-reinstall, installed dask per website. down grading the scikit-learn package just causes errors about deprecated module. bit stuck on this. Using Anaconda on local pc with Jupyter notebook.