dataclasses-json icon indicating copy to clipboard operation
dataclasses-json copied to clipboard

to_dict() returns empty dictionary when called from joblib parallel if using more than one job

Open christian-steinmeyer opened this issue 3 years ago • 1 comments

With dataclasses-json @ 0.5.3 and joblib @ 1.0.1, I receive empty dictionaries when calling to_dict() from inside a delayed parallel job on a dataclass_json (same behavior when using Mixin).

Consider this MWE:

from dataclasses import dataclass

from dataclasses_json import dataclass_json
from joblib import Parallel, delayed


@dataclass_json
@dataclass
class MyClass:
    number: int


def print_my_class(number: int):
    my_class = MyClass(number=number)
    print(f'{my_class.to_dict()}')


if __name__ == '__main__':
    Parallel(n_jobs=1)(delayed(print_my_class)(i) for i in range(1000))  # works
    Parallel(n_jobs=-1)(delayed(print_my_class)(i) for i in range(1000))  # does not work

Any ideas how to fix this or workarounds?

christian-steinmeyer avatar May 31 '21 11:05 christian-steinmeyer

I can confirm the same issue with dataclasses-json @ 0.5.7 and joblib @ 1.2.0 .

It seems to work when using

Parallel(n_jobs=-1, backend="multiprocessing")(delayed(print_my_class)(i) for i in range(1000))

since by default joblib uses the loky backend. Don't quote me on this, but it probably has to do with how loky handles memory when spawning processes.

orphefs avatar Nov 25 '22 10:11 orphefs