loky icon indicating copy to clipboard operation
loky copied to clipboard

How to reuse a cache

Open basnijholt opened this issue 5 years ago • 3 comments
trafficstars

When using memoization (not with functools.lru_cache because https://github.com/joblib/loky/issues/268) I am unable to get loky to use the cache.

I guess this is because ex.submit(f, ...) repickles f each time. Is it possible to tell loky to not do that?

In this example below, I show that a concurrent.futures.ProcessPoolExecutor uses the cache, while loky doesn't do this.

from concurrent.futures import ProcessPoolExecutor
import time
import loky


def memoize(f):
    memo = {}

    def helper(x):
        if x not in memo:
            memo[x] = f(x)
        return memo[x]

    return helper


@memoize
def g(x):
    time.sleep(5)


def f(x):
    g(1)
    return x


with loky.reusable_executor.get_reusable_executor(max_workers=1) as ex:
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)

# prints
# 5.490137338638306
# 5.018247604370117 <---- cache isn't reused



with ProcessPoolExecutor(max_workers=1) as ex:
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)

# prints
# 5.012995958328247
# 0.002056598663330078 <---- used the cache (because it forked the process and doesn't need to repickle)

basnijholt avatar Sep 03 '20 13:09 basnijholt

Instead of using a local dict to store the cache entries you should use a module attribute. module attributes (apart from those defined in the __main__ module) are pickled by reference instead of by value, so that should work. Each worker process would have it's own cache.

ogrisel avatar Sep 24 '20 18:09 ogrisel

This issue made me think about improving the cloudpickle pull request: https://github.com/cloudpipe/cloudpickle/pull/309#issuecomment-698562884 . It might be possible to implement re-usable lru_cache for interactively defined functions but this is not trivial work.

ogrisel avatar Sep 24 '20 21:09 ogrisel

It would be great to make lru_cache work.

For now, I have fixed it by making a cache that is shared in memory: docs, source.

basnijholt avatar Sep 25 '20 08:09 basnijholt