tpot
tpot copied to clipboard
OPTIMIZATION PROCESS STOPS : stopit.utils.TimeoutException
Hi! We are trying to run TPOT with 100 generations, but the optimization process stops and we get the following error message:
File "/home/anaconda3/lib/python3.7/_weakrefset.py", line 38, in _remove def _remove(item, selfref=ref(self)): stopit.utils.TimeoutException
Does anyone knows how we can solve this problem? Thanks in advance.
Hmm, I did not seem this error message before. Could you please provide more details (like versions of all TPOT dependencies and a demo) to reproduce this error?
I think it seems that it was related to max_eval_time_mins
parameter, which control the maximum run time of each pipeline evaluation. Maybe the dataset in your case is very large and increasing this value assigned to this parameter may be helpful.
As far I understand, max_eval_time_mins just makes to skip a specific pipeline if it takes more than x minutes, and it doesn't stop the optimization process a priori. We already set this parameter to 60 minutes and the error continues to occur. I have read somewhere else (https://github.com/glenfant/stopit/issues/16) that this type of error message is related to the communicate() method of Popen, but I still can't resolve it.
I left the code we are running:
def run_TPOT_auto_ML(data_path,target,sep='\t',exclude=[],generations=1,population_size=100,cv=5,fold=0,rseed=42,results_path='./'):
if results_path != './' and not os.path.exists(results_path): os.makedirs(results_path)
# Loading dictionary of pipelines to use
config_file = np.load('TPOT_config_file.npy',allow_pickle=True).item()
# Loading
data = pd.read_csv(data_path,sep=sep)
feats = [c for c in data.columns if c != target and c not in exclude]
X,y = data[feats].values,data[target].values
# Split
idx = [tr for tr,_ in StratifiedKFold(cv,random_state=rseed).split(X,y)][fold]
with open(os.path.join(results_path,'TPOT_train_idx_{}_{}.json'.format(fold,rseed)), 'w') as outfile:
json.dump({
'data_path':data_path,
'target':target,
'cv':cv, 'fold':fold,
'idx':idx.tolist(),
}, outfile)
CLF = TPOTClassifier(generations=generations,population_size=population_size,config_dict=config_file,verbosity=2,n_jobs=1,max_eval_time_mins=60)
CLF.fit(X[idx],y[idx])
# Export the best pipeline
CLF.export(output_file_name=os.path.join(results_path,'TPOT_best_pipeline_{}.py'.format(rseed)),data_file_path=data_path)
RESULTS = pd.DataFrame([{
'Generation':CLF.evaluated_individuals_[pipe]['generation'],
'Model':pipe,
'Internal_cv_score':CLF.evaluated_individuals_[pipe]['internal_cv_score'],
'Mutation_count':CLF.evaluated_individuals_[pipe]['mutation_count'],
'Crossover_count':CLF.evaluated_individuals_[pipe]['crossover_count'],
'Predecessor':CLF.evaluated_individuals_[pipe]['predecessor'],
'Operator_count':CLF.evaluated_individuals_[pipe]['operator_count']
} for pipe in CLF.evaluated_individuals_])
RESULTS.to_csv(os.path.join(results_path,'auto_ML_results_{}.csv'.format(rseed)),sep=sep,index=False)
Hmm, that is strange. Is this error reproducible with a small benchmark, like Iris dataset? If so, please let us know the versions of TPOT and its dependencies as well as the config_file
.
I second that question / issue (tpot version: 0.11.5
)
Here is a workaround that may work as a pointer:
from tpot import decorators
decorators.MAX_EVAL_SECS = 100
tpot_obj.fit(train_X, train_y)
Otherwise one could get stopit.utils.TimeoutException
with a message
... utils.py:82] Code block execution exceeded 2 seconds timeout
. (MAX_EVAL_SECS is 2 by default)
Hmm, @vlaskinvlad how about changing the decorators.MAX_EVAL_SECS
to 5 or 10, I think 100 cannot control the time limit for pretest pipeline with a small subset of data (max sample size = 50). How many features in your datasets that causing this issue?
Hi @weixuanfu, I am getting the same message (though it doesn't seem to stop iteration) with a dataset containing ~600 observations and ~1200 features (tpot 0.11.7
). I think it might have to do with the execution using up all the available RAM and then running very slowly on swap. Let me know if there's any other information that might help.