gplearn
gplearn copied to clipboard
UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Hello, Thank you for your very good libs :-) In association with Celery Task Queue and MongoDB it is pure happiness!
Describe the bug
When I use the parameter n_jobs = 10, I get a warning message from joblib and the job is only done in one thread. I think it's related to using Celery but I can't figure out how to fix the problem.
Expected behavior
I would like to be able to parallelize the calculation on my 12-core processor.
Actual behavior
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] | Population Average | Best Individual |
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] ---- ------------------------- ------------------------------------------ ----------
[2020-12-31 13:22:33,736: WARNING/ForkPoolWorker-1] Gen Length Fitness Length Fitness OOB Fitness Time Left
[2020-12-31 13:22:33,737: WARNING/ForkPoolWorker-1] /home/user/works/project/venv/lib/python3.8/site-packages/joblib/parallel.py:733: UserWarning: Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Steps to reproduce the behavior
from gplearn.genetic import SymbolicRegressor
from celery import Celery
import pickle
import codecs
CELERY_APP = 'process'
CELERY_BACKEND = 'mongodb://localhost:27017/tasks-results'
CELERY_BROKER = 'mongodb://localhost:27017/tasks-broker'
appCelery = Celery(CELERY_APP, backend=CELERY_BACKEND, broker=CELERY_BROKER)
def getCeleryBackend():
return appCelery.backend
def encodeObjLearn(objLearn):
return codecs.encode(pickle.dumps(objLearn), "base64").decode()
def decodeObjLearn(sLearn):
return pickle.loads(codecs.decode(sLearn.encode(), "base64"))
@appCelery.task(name='capture.tasks.TaskSymbolicRegressor')
def TaskSymbolicRegressor(X_train, y_train):
est_gp = SymbolicRegressor(population_size=10000, n_jobs=10,
generations=100, stopping_criteria=0.01,
p_crossover=0.7, p_subtree_mutation=0.1,
p_hoist_mutation=0.05, p_point_mutation=0.1,
max_samples=0.9, verbose=1,
parsimony_coefficient=0.01, random_state=0)
est_gp.fit(X_train, y_train)
delattr(est_gp, '_programs')
return encodeObjLearn(est_gp)
System information
Linux-5.4.0-58-generic-x86_64-with-glibc2.29 Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] NumPy 1.19.2 SciPy 1.5.4 Scikit-Learn 0.24.0 Joblib 1.0.0 gplearn 0.4.1
I'm not familiar with how celery works, but joblib will do all the parallelisation under the hood, you just need to set n_job
when initialising the estimator. Is this something you would expect to work with, say, a random forest in scikit learn?
Hello and happy new year :-)
Thank you for this quick response. I just did a test with Random Forest with n_jobs = 10. It seems to work without problems:
@appCelery.task(name='capture.tasks.TaskRandomForestRegressor')
def TaskRandomForestRegressor(X_train, y_train):
est_rf = RandomForestRegressor(n_jobs=10)
est_rf.fit(X_train, y_train)
return encodeObjLearn(est_rf)
return:
[2021-01-04 08:47:42,206: INFO/MainProcess] Received task: capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932]
[2021-01-04 08:47:42,386: INFO/ForkPoolWorker-1] Task capture.tasks.TaskRandomForestRegressor[011a5d09-6a51-45b4-9ef0-27f5277fe932] succeeded in 0.17795764410402626s:
I found an answer to this error but it requires a lib change and I'm not sure it works :
https://github.com/joblib/joblib/issues/978
how can I apply multi_process on gplearn SymbolicTransformer?
It seems that gplearn support multi_thread by setting n_jobs=10.
Can we run it on multi process,which is even faster? How to do that ?
thx!
https://github.com/joblib/joblib/issues/978