Use better cache?
Consider to use LRU cache or other better algorithm to replace MAX_CACHE_SIZE dict to avoid clear cache when the cache length achive to MAX_CACHE_SIZE
Thanks for the feedback.
That is definitely a goal as I'm not a fan of the current strategy either, I tried moving to a more fifo-ish strategy inspired by the stdlib earlier this year but it was broken under concurrent usage so reverted to the older policy (also because the fifo property didn't work before 3.6 as I didn't use an ordereddict).
But even though 2.7 is out of support, it still shows up as background usage (on pypistats for instance), as such I'd rather bundle that with things like API improvements (#93) and packaging updates (#120), rather than just drop 2.7 for unknown advantages.
The "unknown advantages" part bothers me as well: I've looked a bit for real / realistic work loads (hopefully significant) in order to better evaluate different cache strategies, or even more minor changes, to make more informed decisions, but have had no luck so far.
The ideal would be to have multiple such real-world datasets from sites with different focus (e.g. mobile-oriented, general public, very specialised, very technical) in order to see the difference in patterns and their impacts. If you have one such dataset which could be published, I'd be interested, and it might be interesting to the other implementations as well.
As you say, I'm just using py2.7 now, I have a little data set, but it's internal so I'm sorry that I cat't provide it to you. I tried to write and use a simple LRUCache in my work, it's just look like this:
from collections import OrderedDict
class LRUCache(OrderedDict):
def __init__(self, capacity=512):
super(LRUCache, self).__init__() # Actually no need to extend OrderedDict
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key):
if self.cache.__contains__(key): # maybe: if key in self.cache:
value = self.cache.pop(key)
self.cache[key] = value # reset to end
return value
else:
return None
def set(self, key, value):
if self.cache.__contains__(key): # maybe: if key in self.cache:
value = self.cache.pop(key)
self.cache[key] = value # reset to end
else:
while len(self.cache) >= self.capacity:
self.cache.popitem(last=False) # pop the first item
self.cache[key] = value # set to end
This code work in a single thread/process, I don't know if it's suitable to use it in multi thread/process. Maybe need to use a lock to keep safe like:
from collections import OrderedDict
from threading import Lock
class LRUCache:
def __init__(self, capacity=512):
self.capacity = capacity
self._cache = OrderedDict()
self._mutex = Lock()
def get(self, key):
with self._mutex:
if self._cache.__contains__(key): # maybe: if key in self._cache:
value = self._cache.pop(key)
self._cache[key] = value # reset to end
return value
else:
return None
def set(self, key, value):
with self._mutex:
if self._cache.__contains__(key): # maybe: if key in self._cache:
value = self._cache.pop(key)
self._cache[key] = value # reset to end
else:
while len(self._cache) >= self.capacity:
self._cache.popitem(last=False) # pop the first item
self._cache[key] = value # set to end
I have just using python for one year, maybe not good at write better python code now? Hope the code above is helpful to you
Notes of cache replacement policies to look at (aside from the obvious LRU):