tpot
tpot copied to clipboard
cv in TPOTClassifier
I faced an issue with tpot: I am working with ~70000 x 36 features and predicting a multi-class target. In particular, I am playing with Tanzania water wells dataset Target is imbalanced : functional 0.54 non functional 0.38 functional needs repair 0.07
To process such data in a reasonable time, I am using the tpot - cuml configuration.
I created TPOTClassifier object with the following parameters:
tpt=TPOTClassifier(generations=100, population_size=100,
offspring_size=None, mutation_rate=0.9,
crossover_rate=0.1,
scoring='accuracy', cv=3,
subsample=1.0, n_jobs=1,
max_time_mins=None, max_eval_time_mins=5,
random_state=None, config_dict='TPOT cuML',
template=None,
warm_start=False,
memory=None,
use_dask=False,
periodic_checkpoint_folder=PREFIX+'tpot_checkpoints',
early_stop=None,
verbosity=3,
disable_update_check=False,
log_file=PREFIX+'tpot_checkpoints/log_cuml.txt',
)
After a few hundreds of iterations, it falls with traceback:
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3.
% (min_groups, self.n_splits)), UserWarning)
I believe that this kind of traceback indicates that there is no minor class present in one of the splits.
Can you please confirm that when I am setting cv= 3,5, 10 - tpot uses stratified folds?
TPOTClassifier uses stratified kfold when cv=3,5 or 10.
I think this issue maybe related to #1148. If so, you need update TPOT or downgrade sklearn to 0.23.2.
@weixuanfu I tried to remove sklearn 0.24 and install 0.23.2 Still getting an error exactly at iteration 200.
Version 0.11.6.post3 of tpot is outdated. Version 0.11.7 was released 4 days ago.
Optimization Progress: 2%
200/10100 [27:26<24:05:12, 8.76s/pipeline]
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
% (min_groups, self.n_splits)), UserWarning)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
741 per_generation_function=self._check_periodic_pipeline,
--> 742 log_file=self.log_file_
743 )
6 frames
/usr/local/lib/python3.6/site-packages/tpot/gp_deap.py in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function, log_file)
280 if per_generation_function is not None:
--> 281 per_generation_function(gen)
282
/usr/local/lib/python3.6/site-packages/tpot/base.py in _check_periodic_pipeline(self, gen)
1051 """
-> 1052 self._update_top_pipeline()
1053 if self.periodic_checkpoint_folder is not None:
/usr/local/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
837 break
--> 838 raise RuntimeError('There was an error in the TPOT optimization '
839 'process. This could be because the data was '
RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly. If you enabled PyTorch estimators, please check the data requirements in the online documentation: https://epistasislab.github.io/tpot/using/
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-26-64b955e8a61a> in <module>()
----> 1 tpt.fit(X,y)
/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
771 # raise the exception if it's our last attempt
772 if attempt == (attempts - 1):
--> 773 raise e
774 return self
775
/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
762 self._pbar.close()
763
--> 764 self._update_top_pipeline()
765 self._summary_of_best_pipeline(features, target)
766 # Delete the temporary cache before exiting
/usr/local/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
836 error_score="raise")
837 break
--> 838 raise RuntimeError('There was an error in the TPOT optimization '
839 'process. This could be because the data was '
840 'not formatted properly, or because data for '
RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly. If you enabled PyTorch estimators, please check the data requirements in the online documentation: https://epistasislab.github.io/tpot/using/
Hmm, I am not sure about the issues. Please try tpot 0.11.7. If it doesn’t work, please provide a demo for reproducing this issue.
Hey, just wanted to say that I ran into the same problem, and updating to 0.11.7 worked for me. Maybe it was an undetected bug? I can post my specs if needed, I was training on a multi-class classification problem too