loky
loky copied to clipboard
How to reuse a cache
When using memoization (not with functools.lru_cache because https://github.com/joblib/loky/issues/268) I am unable to get loky to use the cache.
I guess this is because ex.submit(f, ...) repickles f each time. Is it possible to tell loky to not do that?
In this example below, I show that a concurrent.futures.ProcessPoolExecutor uses the cache, while loky doesn't do this.
from concurrent.futures import ProcessPoolExecutor
import time
import loky
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
@memoize
def g(x):
time.sleep(5)
def f(x):
g(1)
return x
with loky.reusable_executor.get_reusable_executor(max_workers=1) as ex:
t = time.time()
ex.submit(f, 10).result()
print(time.time() - t)
t = time.time()
ex.submit(f, 10).result()
print(time.time() - t)
# prints
# 5.490137338638306
# 5.018247604370117 <---- cache isn't reused
with ProcessPoolExecutor(max_workers=1) as ex:
t = time.time()
(ex.submit(f, 10).result())
print(time.time() - t)
t = time.time()
(ex.submit(f, 10).result())
print(time.time() - t)
# prints
# 5.012995958328247
# 0.002056598663330078 <---- used the cache (because it forked the process and doesn't need to repickle)
Instead of using a local dict to store the cache entries you should use a module attribute. module attributes (apart from those defined in the __main__ module) are pickled by reference instead of by value, so that should work. Each worker process would have it's own cache.
This issue made me think about improving the cloudpickle pull request: https://github.com/cloudpipe/cloudpickle/pull/309#issuecomment-698562884 . It might be possible to implement re-usable lru_cache for interactively defined functions but this is not trivial work.