dataclasses-json
dataclasses-json copied to clipboard
to_dict() returns empty dictionary when called from joblib parallel if using more than one job
With dataclasses-json @ 0.5.3
and joblib @ 1.0.1
, I receive empty dictionaries when calling to_dict()
from inside a delayed parallel job on a dataclass_json (same behavior when using Mixin).
Consider this MWE:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from joblib import Parallel, delayed
@dataclass_json
@dataclass
class MyClass:
number: int
def print_my_class(number: int):
my_class = MyClass(number=number)
print(f'{my_class.to_dict()}')
if __name__ == '__main__':
Parallel(n_jobs=1)(delayed(print_my_class)(i) for i in range(1000)) # works
Parallel(n_jobs=-1)(delayed(print_my_class)(i) for i in range(1000)) # does not work
Any ideas how to fix this or workarounds?
I can confirm the same issue with dataclasses-json @ 0.5.7
and joblib @ 1.2.0
.
It seems to work when using
Parallel(n_jobs=-1, backend="multiprocessing")(delayed(print_my_class)(i) for i in range(1000))
since by default joblib
uses the loky
backend. Don't quote me on this, but it probably has to do with how loky
handles memory when spawning processes.