cached-property
cached-property copied to clipboard
Support caching generators
Many times I'll have a function such as
def x():
result = some_long_function_that_eventually_calculates_a_list()
yield from (result.name for result in result)
yield from (result.path for result in result)
I can't cache this function using regular cached_property because the generator will be exhausted. I want to calculate it once, then whenever it is being called again return the items I retrieved in the order I retrieved them.
I think it's clear that mixing cached_property with generator functions is a little confusing at first glance. Rather than keeping the data returned by a generator method, using cached_property in this case returns the data once and then immediately loses it, leading one to ask, "why isn't it caching the results?" Of course, it is caching something, just not what you might naturally expect it to cache. It's caching a generator object which can only be exhausted once, rather than caching the results from the generator.
I think the solution here is to insert an intermediate decorator that dumps the generator content into a list. This list is then the thing that becomes cached by cached_property:
from cached_property import cached_property
def yield_into_list(func):
def wrapped(*args, **kwargs):
return list(func(*args, **kwargs))
return wrapped
class IceCreamPuns:
x = 2
@cached_property
def i_scream(self):
# returns a list of results. a silly example, i admit.
return [n for n in range(self.x)]
@cached_property
def eye_scream(self):
# with the yield syntax, this code just got much more readable.
# but cached_property caches the generator, not the results!
yield from range(self.x)
@cached_property
@yield_into_list
def eyes_cream(self):
# now we can use the yield syntax and re-use the results.
yield from range(self.x)
puns = IceCreamPuns()
list(puns.i_scream) # [0, 1]
list(puns.i_scream) # [0, 1]
list(puns.eye_scream) # [0, 1]
list(puns.eye_scream) # [] <- unexpected if you don't know this is a generator!
list(puns.eyes_cream) # [0, 1]
list(puns.eyes_cream) # [0, 1]
@pydanny Given there are now three tickets on this subject (#34, #90), I think it would be worth suggesting a simple solution like this in the readme, and maybe including a decorator like yield_into_list in the package. If not, I hope this comment helps other developers better understand how decorators and generators work. :)
Hi @wizpig64 and thanks for replying!
I think your @yield_int_list decorator will fail on the following example:
from cached_property import cached_property
def yield_into_list(func):
def wrapped(*args, **kwargs):
return list(func(*args, **kwargs))
return wrapped
class ConutriesPuns:
def Czech_this_out(self):
a = 0
while True:
yield a
a += 1
@cached_property
@yield_into_list
def i_do_not_Bolivia(self):
a = 0
while True:
yield a
a += 1
puns = CountriesPuns()
for x in puns.Czech_this_out():
print x # Works
# -----------------------------------------------------
for x in puns.i_do_not_Bolivia:
print x # Doesn't work
I think I would try and implement a cached generator that handles these cases and is lazily-evaluated, so maybe you guys could point to it as a solution
You're right, my solution would only work on finite generators, not infinite ones. Worse, i_do_not_Bolivia locked up my system!
From a few minutes of googling, it looks like this is the right way to duplicate a generator:
from itertools import tee
def infinite_generator():
a = 0
while True:
yield a
a += 1
original = infinite_generator()
original, copy = tee(original)
for _ in range(3):
print('original', next(original))
print('copy', next(copy))
prints
original 0
copy 0
original 1
copy 1
original 2
copy 2
The docs for itertools.tee mention a few caveats though, which makes me wary:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
I'll try and see if I can work this into a decorator regardless.
@rinslow Okay, here's a decorator that returns a new copy of the original generator every time __get__ is called, using itertools.tee.
Notes:
- Rather than storing value in self.__dict__ like cached_property does, I store in a new dictionary, self.generators, as I want to fork and re-save the generator again for every new __get__() call, and don't want __dict__ to bypass it. This could probably be integrated into cached_property in a way that still keeps the __dict__ speedup for non-generators, however.
- I have no idea if this is async-friendly.
- I think using list() is still going to be the better approach when dealing with finite generators, in most cases.
from itertools import tee
class cached_generator_property:
def __init__(self, func):
self.__doc__ = getattr(func, "__doc__")
self.func = func
self.generators = {}
def __get__(self, obj, cls):
if obj is None:
return self
if self.func.__name__ in self.generators:
generator = self.generators[self.func.__name__]
else:
generator = self.func(obj)
self.generators[self.func.__name__], value = tee(generator)
return value
class Foo:
@cached_generator_property
def finite(self):
yield from range(3)
@cached_generator_property
def infinite(self):
a = 0
while True:
print('debug: you should only see this line once for a =', a)
yield a
a += 1
testing it in the console:
>>> foo = Foo()
>>> print(list(foo.finite))
[0, 1, 2]
>>> print(list(foo.finite))
[0, 1, 2]
>>> a = foo.infinite
>>> b = foo.infinite
>>> for _ in range(3):
... print(next(a))
... print(next(b))
...
debug: you should only see this line once for a = 0
0
0
debug: you should only see this line once for a = 1
1
1
debug: you should only see this line once for a = 2
2
2
>>> c = foo.infinite
>>> for _ in range(4):
... print(next(c))
...
0
1
2
debug: you should only see this line once for a = 3
3
Hey guys, I implemented a cachedgenerator, it's quite simple, you can read the source code at: https://github.com/rinslow/cachedgenerator
would you want to integrate it with the cached-property library?
@rinslow Integration would be a good thing as others have had to struggle through this "edge" case. If you decide to submit, I'll make it a priority to review your pull request(s). 😄