joblib icon indicating copy to clipboard operation
joblib copied to clipboard

Add option to specify an initialization function for `'loky'` and `'multiprocessing'` backends

Open shwina opened this issue 2 years ago • 3 comments

Closes https://github.com/joblib/joblib/issues/381

This PR adds the ability to specify an initialization function that is run once per worker process when using the 'loky' and 'multiprocessing' backends.

Usage:

Parallel(initializer=my_func, initargs=(x, y, z))
# or
with parallel_config(initializer=my_func, initargs=(x, y, z)):
    ...

shwina avatar Nov 20 '23 14:11 shwina

Hi @ogrisel - wondering if you have any thoughts on how to test the implementation here correctly?

shwina avatar Feb 06 '24 18:02 shwina

Hi @ogrisel, this feature request has come up a couple times recently, including in a StackOverflow question: https://stackoverflow.com/questions/78642680/using-load-ext-cudf-pandas-throws-attributeerror

Would it be possible to work with a joblib developer to identify how to move forward with tests?

bdice avatar Jun 24 '24 18:06 bdice

Hi, just wondering is theres been any behind-the-scenes motion on this, as this would be a REALLY handy piece of functionality to have for pre-seeding workers with large, constant, but procedurally generated/loaded datasets at the start of a large number of jobs.

SJ-Innovation avatar Jul 15 '24 09:07 SJ-Innovation