tpot icon indicating copy to clipboard operation
tpot copied to clipboard

cv in TPOTClassifier

Open senovr opened this issue 4 years ago • 4 comments

I faced an issue with tpot: I am working with ~70000 x 36 features and predicting a multi-class target. In particular, I am playing with Tanzania water wells dataset Target is imbalanced : functional 0.54 non functional 0.38 functional needs repair 0.07

To process such data in a reasonable time, I am using the tpot - cuml configuration.

I created TPOTClassifier object with the following parameters:

tpt=TPOTClassifier(generations=100, population_size=100,
                          offspring_size=None, mutation_rate=0.9,
                          crossover_rate=0.1,
                          scoring='accuracy', cv=3,
                          subsample=1.0, n_jobs=1,
                          max_time_mins=None, max_eval_time_mins=5,
                          random_state=None, config_dict='TPOT cuML',
                          template=None,
                          warm_start=False,
                          memory=None,
                          use_dask=False,
                          periodic_checkpoint_folder=PREFIX+'tpot_checkpoints',
                          early_stop=None,
                          verbosity=3,
                          disable_update_check=False,
                          log_file=PREFIX+'tpot_checkpoints/log_cuml.txt',
                          
                          )

After a few hundreds of iterations, it falls with traceback:

/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3.
  % (min_groups, self.n_splits)), UserWarning)

I believe that this kind of traceback indicates that there is no minor class present in one of the splits.

Can you please confirm that when I am setting cv= 3,5, 10 - tpot uses stratified folds?

senovr avatar Jan 11 '21 11:01 senovr

TPOTClassifier uses stratified kfold when cv=3,5 or 10.

I think this issue maybe related to #1148. If so, you need update TPOT or downgrade sklearn to 0.23.2.

weixuanfu avatar Jan 11 '21 11:01 weixuanfu

@weixuanfu I tried to remove sklearn 0.24 and install 0.23.2 Still getting an error exactly at iteration 200.

Version 0.11.6.post3 of tpot is outdated. Version 0.11.7 was released 4 days ago.
Optimization Progress: 2%
200/10100 [27:26<24:05:12, 8.76s/pipeline]
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    741                     per_generation_function=self._check_periodic_pipeline,
--> 742                     log_file=self.log_file_
    743                 )

6 frames
/usr/local/lib/python3.6/site-packages/tpot/gp_deap.py in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function, log_file)
    280         if per_generation_function is not None:
--> 281             per_generation_function(gen)
    282 

/usr/local/lib/python3.6/site-packages/tpot/base.py in _check_periodic_pipeline(self, gen)
   1051         """
-> 1052         self._update_top_pipeline()
   1053         if self.periodic_checkpoint_folder is not None:

/usr/local/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
    837                         break
--> 838                 raise RuntimeError('There was an error in the TPOT optimization '
    839                                    'process. This could be because the data was '

RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly. If you enabled PyTorch estimators, please check the data requirements in the online documentation: https://epistasislab.github.io/tpot/using/

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-26-64b955e8a61a> in <module>()
----> 1 tpt.fit(X,y)

/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    771                     # raise the exception if it's our last attempt
    772                     if attempt == (attempts - 1):
--> 773                         raise e
    774             return self
    775 

/usr/local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    762                         self._pbar.close()
    763 
--> 764                     self._update_top_pipeline()
    765                     self._summary_of_best_pipeline(features, target)
    766                     # Delete the temporary cache before exiting

/usr/local/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
    836                                                     error_score="raise")
    837                         break
--> 838                 raise RuntimeError('There was an error in the TPOT optimization '
    839                                    'process. This could be because the data was '
    840                                    'not formatted properly, or because data for '

RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly. If you enabled PyTorch estimators, please check the data requirements in the online documentation: https://epistasislab.github.io/tpot/using/

senovr avatar Jan 11 '21 12:01 senovr

Hmm, I am not sure about the issues. Please try tpot 0.11.7. If it doesn’t work, please provide a demo for reproducing this issue.

weixuanfu avatar Jan 11 '21 12:01 weixuanfu

Hey, just wanted to say that I ran into the same problem, and updating to 0.11.7 worked for me. Maybe it was an undetected bug? I can post my specs if needed, I was training on a multi-class classification problem too

v2thegreat avatar Feb 12 '21 02:02 v2thegreat