pyjulia icon indicating copy to clipboard operation
pyjulia copied to clipboard

How should we use pyjulia in parallel?

Open 00sapo opened this issue 4 years ago • 5 comments

I want to propose a discussion about how pyjulia should be used in parallel computing.

I recently found a way that seems rather stable:

  1. we should make sure that code is loaded only once per each process
  2. we should use w+ mode for memmapping

Here is a mini snippet with a decorator that could be added to pyjulia to ensure point 1. For point 2, instead, joblib has an option in Parallel. Should we may explicit this in the documentation?

def julia_import(module: str, filename: str):
    def decorator(fn):
        def wrapper(*args, **kwargs):
            # including stuffs
            from julia import Main
            if not hasattr(Main, module):
                Main.include(filename)

            return fn(Main, *args, **kwargs)
        return wrapper
    return decorator


@julia_import("MyModule", "lib.jl")
def python_function_using_julia(Main, *args, **kwargs):
    return Main.MyModule.julia_function(*args, **kwargs)

res = Parallel(n_jobs=-1, mmap_mode="w+")(
            delayed(python_function_using_julia)(arr) for arr in tqdm(data))

Example from: https://github.com/00sapo/pyjulia-vs-juliacall/blob/master/test.py

00sapo avatar Oct 28 '21 12:10 00sapo

@00sapo Would this code work within a Python sub-process?

sibyjackgrove avatar Dec 10 '21 19:12 sibyjackgrove

Are you talking about subprocess module or do you want to use parallel code in an already parallel code? In the first case, I don't see why you should. In the second case, avoid it as much as possible!

00sapo avatar Dec 10 '21 19:12 00sapo

@00sapo I have a program that is spawned using the subprocess module from the main process. The program (i.e. each subprocess) calls PyJulia and diffeqpy py. However, I run into out ReadOnlyMemoryError() errors when the number of sub-process is greater than 50 (less than that it works fine). My question is whether your script could help in the stable creation of large number of Julia instances linked to the subprocess.

sibyjackgrove avatar Dec 10 '21 20:12 sibyjackgrove

Have you tried to set memmap="w+"?

Il ven 10 dic 2021, 21:20 Siby Jose Plathottam @.***> ha scritto:

@00sapo https://github.com/00sapo I have a program that is spawned using the subprocess module from the main process. The program (i.e. each subprocess) calls PyJulia and diffeqpy py. However, I run into out ReadOnlyMemoryError() errors when the number of sub-process is greater than 50 (less than that it works fine). My question is whether your script could help in the stable creation of large number of Julia instances linked to the subprocess.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaPy/pyjulia/issues/472#issuecomment-991256336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPOII5OHKFDCH4H2OPWJ4DUQJL6DANCNFSM5G42OHVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

00sapo avatar Dec 10 '21 20:12 00sapo

@00sapo I am not using the Joblib Parallel method for spawning the Julia processes. So, I am not sure how I can set memmap="w+".

sibyjackgrove avatar Dec 14 '21 18:12 sibyjackgrove