cached-property icon indicating copy to clipboard operation
cached-property copied to clipboard

Support caching generators

Open rinslow opened this issue 7 years ago • 6 comments

Many times I'll have a function such as

def x():
    result = some_long_function_that_eventually_calculates_a_list()
    yield from (result.name for result in result)
    yield from (result.path for result in result)

I can't cache this function using regular cached_property because the generator will be exhausted. I want to calculate it once, then whenever it is being called again return the items I retrieved in the order I retrieved them.

rinslow avatar Jun 02 '18 19:06 rinslow

I think it's clear that mixing cached_property with generator functions is a little confusing at first glance. Rather than keeping the data returned by a generator method, using cached_property in this case returns the data once and then immediately loses it, leading one to ask, "why isn't it caching the results?" Of course, it is caching something, just not what you might naturally expect it to cache. It's caching a generator object which can only be exhausted once, rather than caching the results from the generator.

I think the solution here is to insert an intermediate decorator that dumps the generator content into a list. This list is then the thing that becomes cached by cached_property:

from cached_property import cached_property


def yield_into_list(func):
    def wrapped(*args, **kwargs):
        return list(func(*args, **kwargs))
    return wrapped


class IceCreamPuns:
    x = 2

    @cached_property
    def i_scream(self):
        # returns a list of results. a silly example, i admit.
        return [n for n in range(self.x)]

    @cached_property
    def eye_scream(self):
        # with the yield syntax, this code just got much more readable.
        # but cached_property caches the generator, not the results!
        yield from range(self.x)

    @cached_property
    @yield_into_list
    def eyes_cream(self):
        # now we can use the yield syntax and re-use the results.
        yield from range(self.x)


puns = IceCreamPuns()
list(puns.i_scream)  # [0, 1]
list(puns.i_scream)  # [0, 1]
list(puns.eye_scream)  # [0, 1]
list(puns.eye_scream)  # []  <- unexpected if you don't know this is a generator!
list(puns.eyes_cream)  # [0, 1]
list(puns.eyes_cream)  # [0, 1]

@pydanny Given there are now three tickets on this subject (#34, #90), I think it would be worth suggesting a simple solution like this in the readme, and maybe including a decorator like yield_into_list in the package. If not, I hope this comment helps other developers better understand how decorators and generators work. :)

wizpig64 avatar Aug 15 '18 06:08 wizpig64

Hi @wizpig64 and thanks for replying!

I think your @yield_int_list decorator will fail on the following example:


from cached_property import cached_property


def yield_into_list(func):
    def wrapped(*args, **kwargs):
        return list(func(*args, **kwargs))
    return wrapped


class ConutriesPuns:
    def Czech_this_out(self):
        a = 0
        while True:
            yield a
            a += 1

    @cached_property
    @yield_into_list
    def i_do_not_Bolivia(self):
        a = 0
        while True:
            yield a
            a += 1


puns = CountriesPuns()

for x in puns.Czech_this_out():
   print x   # Works

# -----------------------------------------------------

for x in puns.i_do_not_Bolivia:
   print x  # Doesn't work

I think I would try and implement a cached generator that handles these cases and is lazily-evaluated, so maybe you guys could point to it as a solution

rinslow avatar Aug 17 '18 17:08 rinslow

You're right, my solution would only work on finite generators, not infinite ones. Worse, i_do_not_Bolivia locked up my system!

From a few minutes of googling, it looks like this is the right way to duplicate a generator:

from itertools import tee

def infinite_generator():
    a = 0
    while True:
        yield a
        a += 1

original = infinite_generator()

original, copy = tee(original)

for _ in range(3):
    print('original', next(original))
    print('copy', next(copy))

prints

original 0
copy 0
original 1
copy 1
original 2
copy 2

The docs for itertools.tee mention a few caveats though, which makes me wary:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

I'll try and see if I can work this into a decorator regardless.

wizpig64 avatar Aug 17 '18 19:08 wizpig64

@rinslow Okay, here's a decorator that returns a new copy of the original generator every time __get__ is called, using itertools.tee.

Notes:

  1. Rather than storing value in self.__dict__ like cached_property does, I store in a new dictionary, self.generators, as I want to fork and re-save the generator again for every new __get__() call, and don't want __dict__ to bypass it. This could probably be integrated into cached_property in a way that still keeps the __dict__ speedup for non-generators, however.
  2. I have no idea if this is async-friendly.
  3. I think using list() is still going to be the better approach when dealing with finite generators, in most cases.
from itertools import tee

class cached_generator_property:

    def __init__(self, func):
        self.__doc__ = getattr(func, "__doc__")
        self.func = func
        self.generators = {}

    def __get__(self, obj, cls):
        if obj is None:
            return self
        if self.func.__name__ in self.generators:
            generator = self.generators[self.func.__name__]
        else:
            generator = self.func(obj)
        self.generators[self.func.__name__], value = tee(generator)
        return value


class Foo:

    @cached_generator_property
    def finite(self):
        yield from range(3)

    @cached_generator_property
    def infinite(self):
        a = 0
        while True:
            print('debug: you should only see this line once for a =', a)
            yield a
            a += 1

testing it in the console:

>>> foo = Foo()
>>> print(list(foo.finite))
[0, 1, 2]
>>> print(list(foo.finite))
[0, 1, 2]
>>> a = foo.infinite
>>> b = foo.infinite
>>> for _ in range(3):
...     print(next(a))
...     print(next(b))
... 
debug: you should only see this line once for a = 0
0
0
debug: you should only see this line once for a = 1
1
1
debug: you should only see this line once for a = 2
2
2
>>> c = foo.infinite
>>> for _ in range(4):
...     print(next(c))
... 
0
1
2
debug: you should only see this line once for a = 3
3

wizpig64 avatar Aug 17 '18 20:08 wizpig64

Hey guys, I implemented a cachedgenerator, it's quite simple, you can read the source code at: https://github.com/rinslow/cachedgenerator

would you want to integrate it with the cached-property library?

rinslow avatar Aug 18 '18 16:08 rinslow

@rinslow Integration would be a good thing as others have had to struggle through this "edge" case. If you decide to submit, I'll make it a priority to review your pull request(s). 😄

pydanny avatar Sep 30 '18 18:09 pydanny