tpot icon indicating copy to clipboard operation
tpot copied to clipboard

TPOTClassifier error for large data

Open kiranellur opened this issue 1 year ago • 1 comments

I am getting the following error

RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly.

My Current best internal cv score is -inf . Even though the optimisation progress bar is displaying 75%

Even though it is working for smaller dataset , I am getting the erro for those having 200000 rows and 20 columns. I am currently using TPOT version 12.0 Is there any specific reason i am getting this?

Can you please help me to resolve this error. Thank you.

kiranellur avatar Nov 30 '23 19:11 kiranellur

I would recommend trying out TPOT2, the next version of TPOT. You can find it here: https://github.com/EpistasisLab/tpot2 This version is more stable with larger datasets compared to TPOT1. There is also a memory_limit parameter that you can use to set the maximum amount of RAM a single pipeline can take up.

For TPOT1: Perhaps it is simply running out of RAM and crashing?

Some suggestions: You could try to reduce RAM usage by lowering n_jobs. you could try editing the configuration dictionary to use smaller/faster models. One possibility is that fitting the pipeline is taking too long and timing out. You can increase the timeout by setting the parameter max_eval_time_mins .

perib avatar Nov 30 '23 21:11 perib