python-diskcache
python-diskcache copied to clipboard
Memoize with lease
Hi!
use case:
a web API. The API must return values as soon as possible, and values should be updated if some time is passed from last call. So the api is fast and has data quite updated. The data depend on an external service which can take from 1 second to several seconds to compute. Data is collected at the begin of the week using the external service and it rarely changes but when it happen, nice to have it displayed in some minutes.
memoize_stampede was not suited because when the cache expire, the caller has to wait all the recomputation thus the API would be slow. So I modified stampede a little bit, removing the euristic and adding a lease time parameter. It is not perfect because if the service is not called from sometime, it will display the cached value and then update it, but being called from Javascript in some home pages this should not happen often and changes in data are rare, so not a problem.
here the code (nice if it will be included on the recipes!):
def memoize_lease(cache, expire, lease, name=None, typed=False, tag=None):
"""Memoizing cache decorator with cache lease.
The memoization decorator update the cache entry when lease expire. The
decorator return always from the cache, then call the function only when
the lease time has passed, using a separate thread. Setting expire to None
means the caller never wait for the execution of the function, reading
always from the cache. Suitable for quite long running (rare called)
functions, for example a long calculation or web pages which can change
every 5-10 mins and the page can take some seconds to be computed.
If name is set to None (default), the callable name will be determined
automatically.
If typed is set to True, function arguments of different types will be
cached separately. For example, f(3) and f(3.0) will be treated as distinct
calls with distinct results.
The original underlying function is accessible through the `__wrapped__`
attribute. This is useful for introspection, for bypassing the cache, or
for rewrapping the function with a different cache.
>>> from diskcache import Cache
>>> cache = Cache()
>>> @memoize_lease(cache, expire=10, lease=1)
... def fib(number):
... if number == 0:
... return 0
... elif number == 1:
... return 1
... else:
... return fib(number - 1) + fib(number - 2)
>>> print(fib(100))
354224848179261915075
An additional `__cache_key__` attribute can be used to generate the cache
key used for the given arguments.
>>> key = fib.__cache_key__(100)
>>> del cache[key]
Remember to call memoize when decorating a callable. If you forget, then a
TypeError will occur.
:param cache: cache to store callable arguments and return values
:param float expire: seconds until arguments expire
:param float lease: minimum seconds after last execution
we want to update the cache value
:param str name: name given for callable (default None, automatic)
:param bool typed: cache different types separately (default False)
:param str tag: text to associate with arguments (default None)
:return: callable decorator
"""
# Caution: Nearly identical code exists in Cache.memoize
def decorator(func):
"Decorator created by memoize call for callable."
base = (full_name(func),) if name is None else (name,)
def timer(*args, **kwargs):
"Time execution of `func` and return result and time delta."
start = time.time()
result = func(*args, **kwargs)
delta = time.time() - start
return result, delta, time.time()
@functools.wraps(func)
def wrapper(*args, **kwargs):
"Wrapper for callable to cache arguments and return values."
key = wrapper.__cache_key__(*args, **kwargs)
trio, expire_time = cache.get(
key, default=ENOVAL, expire_time=True, retry=True,
)
if trio is not ENOVAL:
result, delta, last_exec = trio
now = time.time()
if (now - last_exec) > lease:
thread_key = key + (ENOVAL,)
thread_added = cache.add(thread_key, None,
expire=delta, retry=True,)
if thread_added:
# Start thread for early recomputation.
def recompute():
with cache:
trio = timer(*args, **kwargs)
cache.set(
key, trio, expire=expire,
tag=tag, retry=True,)
thread = threading.Thread(target=recompute)
thread.daemon = True
thread.start()
return result
trio = timer(*args, **kwargs)
cache.set(key, trio, expire=expire, tag=tag, retry=True)
return trio[0]
def __cache_key__(*args, **kwargs):
"Make key for cache given function arguments."
return args_to_key(base, args, kwargs, typed)
wrapper.__cache_key__ = __cache_key__
return wrapper
return decorator
Interesting. It's been a while since I looked at this code. But I think I understand it. The "expire" is a hard deadline but the "lease" is like a soft deadline. Once the "lease" is passed a thread is started to update the cache item.
If the api isn't called in a long time, values will still expire and cause a delay, right? Or do you set the lease to something short (like a minute) and the expiration to something like (like a week)?
Also, did you read through http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html ? It would be helpful to see Concurrency and Latency charts with this decorator.
The term "lease" is new to me and I like it. Are there similar cache APIs?
Interesting. It's been a while since I looked at this code. But I think I understand it. The "expire" is a hard deadline but the "lease" is like a soft deadline. Once the "lease" is passed a thread is started to update the cache item.
If the api isn't called in a long time, values will still expire and cause a delay, right? Or do you set the lease to something short (like a minute) and the expiration to something like (like a week)?
Yes, this is the main idea, I can set it to a very high value if we are sure it will be called in the mean time, so expiring will be updated and the cache entry almost never expire. The data are openings calendars for public places. The schedule is decided some days before. But sometimes there are updates in the current week, so I'll use 5 minutes because I can't know when changes will happen. Once a week, I'll run a full update of data, so I'm sure I'm updated and aligned at the begin of the week.
Also, did you read through http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html ? It would be helpful to see Concurrency and Latency charts with this decorator.
This page was my starting point but I haven't tested as you. I think the bottleneck can be the number of concurrent threads to calculate different weeks for different public place at the same time but usually this doesn't happen.
I'm thinking of generalize it using 2 functions as a parameter, one for the timer (maybe with a different name) and the condition check (in this caseif (now - last_exec) > lease
) so we can implement many kind of caching strategies. The timer function is the one that execute the function and save the useful data for the caching strategy (+ the delta for the thread barrier), like last execution time, number of time the function has been called, using function arguments etc etc. The condition function would use the data to decide if we want to update the value in the cache.
"Lease" came from dhcp protocol, where the lease is assigned for some time and then renew after expiring.
The term "lease" is new to me and I like it. Are there similar cache APIs?
I don't know. I'm using diskcache because it is very fast, simple and handle concurrency very well. Actually I've tested it with this unicorn app with 16 threads and it spans tens of remote calls which are saved using diskcache without any glitch. Very nice python module!
I've only just came across this module (thanks Python Bytes podcast for the heads up!) and I have a similar use case where my application makes remote API calls that are time costly, diskcache will resolve these, but the cached data needs to have a definable lifetime so that it can be refreshed with current data , love the idea of lease btw!
Hi @yurj , I made a couple small changes and opened a pull request. Feel free to open your own if you've made more modifications.
Can this be rewritten using asyncio and avoid to use threading? It should be similar to https://github.com/grantjenks/python-diskcache/pull/196.