mlcrate icon indicating copy to clipboard operation
mlcrate copied to clipboard

mlc.SuperPool().map() can not recognize global objects in the function

Open zhiruiwang opened this issue 7 years ago • 2 comments

When I passing a lambda function to fill in additional parameters, the map function can not find the f function that is called inside lambda function:

import mlcrate as mlc
pool = mlc.SuperPool() 

def f(x,y):
    return x ** (2/y)

res = pool.map(lambda x: f(x, 2), range(1000))
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\multiprocess\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "C:\Anaconda3\lib\site-packages\mlcrate\__init__.py", line 125, in func_tracked
    return func(x), i
  File "<ipython-input-3-c53f7e0849d6>", line 8, in <lambda>
NameError: name 'f' is not defined

Also when I have a global variable in the function, the map function also can not find the global variable

import mlcrate as mlc
pool = mlc.SuperPool() 

y = 2

def f(x):
    return x ** (2/y)

res = pool.map(f, range(1000))
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\multiprocess\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "C:\Anaconda3\lib\site-packages\mlcrate\__init__.py", line 125, in func_tracked
    return func(x), i
  File "<ipython-input-4-474bcea6b16a>", line 7, in f
NameError: name 'y' is not defined

Is there a way to pass these objects to the process pool to let the pool know which global function and variable we want to use?

zhiruiwang avatar Feb 17 '18 11:02 zhiruiwang

Good catch, I didn't encounter these in my testing. SuperPool is a wrapper around pathos.ProcessPool, and it's not very clear to me how the state of the pool can be updated (by sending variables that were created after the pool was) in pathos - so it wouldn't be easy for me to implement that here.

In both of your cases however, a workaround would be to start the pool after creating the function.
It seems that pathos pickles and sends "one layer" of variables when map() is called, so it'll send the function but not any variables referenced in that function. Likewise, if you call it on a lambda function, it'll pickle that function but not the objects contained within unless they were created before the pool was.

In general, I'm not happy with the performance & stability of SuperPool, so a rewrite which solves these issues is in order. I just need to figure out a better way of implementing it :)

mxbi avatar Mar 18 '18 11:03 mxbi

@zhiruiwang This should work for your use case:

from functools import partial
import mlcrate as mlc
pool = mlc.SuperPool()


def f(x, y):
    return x**(2 / y)

res = pool.map(partial(f, y=2), range(1000))

thomasjpfan avatar Jul 04 '18 23:07 thomasjpfan