tpot
tpot copied to clipboard
n_jobs Not Stopping CPU from Running at 100% on All Threads?
[provide general introduction to the issue and why it is relevant to this repository]
Just performed a fresh install of TPOT (Windows 10, Python 3.7.9) using the following commands in succession (note the 'torch' now, not 'py-torch'):
pip install numpy scipy scikit-learn pandas joblib torch
pip install deap update_checker tqdm stopit xgboost
pip install dask[delayed] dask[dataframe] dask-ml fsspec>=0.3.3 distributed>=2.10.0
pip install tpot
...and I'm having an issue with Dask, specifically, even with setting n_jobs=2
, 100% of all 12 of my cpu's threads are engaged when I start the classifier. Doesn't seem right? My dataset is quite large at 27,270 rows, and 599 columns, so I shrunk it down to 5,000 rows and kept all columns, and still it's pinnin' my CPU pretty hard. Why would this be?
Thanks!
I'm thinking it could be because my dataset is too big, even at 5,000 rows. I shrunk it down to just a couple hundred rows and it's much quieter. Maybe once it loads all copies of the dataset into memory it would settle down?? Population size seems to play a factor as well...
when use_dask is set to True, the n_jobs argument is actually not used. This should probably be fixed. When using dask it just uses dask.compute without setting the number of threads (here). My understanding is that this defaults to using all the threads available.
To fix this, you need to have a dask wrapper around your .fit() command. For example:
import dask
from dask.distributed import Client, progress, LocalCluster
with LocalCluster(threads_per_worker=threads_per_worker, n_workers=n_workers, processes=processes,memory_limit=memory_limit) as cluster:
with Client(cluster) as client:
with dask.distributed.performance_report(filename=Dask_folder): #optional
est = tpot.TPOTClassifier()
est.fit(data)
On a single machine, I think it is best to use processes=False, n_workers=1, and threads_per_worker set to the number of cores/threads you want to use. (It should probably be benchmarked whether to have one worker and many threads or many workers with one thread, not 100% sure which option performs better).