tpot icon indicating copy to clipboard operation
tpot copied to clipboard

[Question] TPOT can't run for a lot of time

Open neel04 opened this issue 3 years ago • 1 comments

I wanted to run TPOT for a large amount of time on my server (about 200--> population size and 100--> Generations) it works with 50 gens but if I increase any more, then it doesn't work.

To describe, what happens is that tqdm shows that it gets stuck at one point (nearly always around #4000th pipeline), it uses exactly 1 CPU core when in its 'stuck' phase, does not store any checkpoint, and does not move any further. This seems that it has stopped doing anything.

Any idea what the issue could be?

neel04 avatar Apr 01 '21 12:04 neel04

This is most likely related to TPOT not being able to terminate some pipelines. The current timeout method doesn't always work on specific modules. If those modules can't be timed out and they run for a long time (such as SVC), then TPOT will get stuck slowly fitting a single pipeline which may never converge.

This could be resolved by using func_timeout https://pypi.org/project/func-timeout/

We think this has been resolved in the next version of TPOT, TPOT2, found here: https://github.com/EpistasisLab/tpot2

It may be a good idea to bring the same fix to this version of TPOT as well.

perib avatar May 09 '23 01:05 perib