uap-python Use better cache?

Consider to use LRU cache or other better algorithm to replace MAX_CACHE_SIZE dict to avoid clear cache when the cache length achive to MAX_CACHE_SIZE

Sep 21 '22 07:09 betterlch

Thanks for the feedback.

That is definitely a goal as I'm not a fan of the current strategy either, I tried moving to a more fifo-ish strategy inspired by the stdlib earlier this year but it was broken under concurrent usage so reverted to the older policy (also because the fifo property didn't work before 3.6 as I didn't use an ordereddict).

But even though 2.7 is out of support, it still shows up as background usage (on pypistats for instance), as such I'd rather bundle that with things like API improvements (#93) and packaging updates (#120), rather than just drop 2.7 for unknown advantages.

The "unknown advantages" part bothers me as well: I've looked a bit for real / realistic work loads (hopefully significant) in order to better evaluate different cache strategies, or even more minor changes, to make more informed decisions, but have had no luck so far.

The ideal would be to have multiple such real-world datasets from sites with different focus (e.g. mobile-oriented, general public, very specialised, very technical) in order to see the difference in patterns and their impacts. If you have one such dataset which could be published, I'd be interested, and it might be interesting to the other implementations as well.

Sep 24 '22 12:09 masklinn

As you say, I'm just using py2.7 now, I have a little data set, but it's internal so I'm sorry that I cat't provide it to you. I tried to write and use a simple LRUCache in my work, it's just look like this:

from collections import OrderedDict


class LRUCache(OrderedDict):
    def __init__(self, capacity=512):
        super(LRUCache, self).__init__()  # Actually no need to extend OrderedDict
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key):
        if self.cache.__contains__(key):  # maybe: if key in self.cache:
            value = self.cache.pop(key)
            self.cache[key] = value  # reset to end
            return value
        else:
            return None

    def set(self, key, value):
        if self.cache.__contains__(key):  # maybe: if key in self.cache:
            value = self.cache.pop(key)
            self.cache[key] = value  # reset to end
        else:
            while len(self.cache) >= self.capacity:
                self.cache.popitem(last=False)  # pop the first item
            self.cache[key] = value  # set to end

This code work in a single thread/process, I don't know if it's suitable to use it in multi thread/process. Maybe need to use a lock to keep safe like:

from collections import OrderedDict
from threading import Lock


class LRUCache:
    def __init__(self, capacity=512):
        self.capacity = capacity
        self._cache = OrderedDict()
        self._mutex = Lock()

    def get(self, key):
        with self._mutex:
            if self._cache.__contains__(key):  # maybe: if key in self._cache:
                value = self._cache.pop(key)
                self._cache[key] = value  # reset to end
                return value
            else:
                return None

    def set(self, key, value):
        with self._mutex:
            if self._cache.__contains__(key):  # maybe: if key in self._cache:
                value = self._cache.pop(key)
                self._cache[key] = value  # reset to end
            else:
                while len(self._cache) >= self.capacity:
                    self._cache.popitem(last=False)  # pop the first item
                self._cache[key] = value  # set to end

I have just using python for one year, maybe not good at write better python code now? Hope the code above is helpful to you

Sep 24 '22 13:09 betterlch

Notes of cache replacement policies to look at (aside from the obvious LRU):

QD-LP-FIFO (basic summary, official slides deck)
Adaptive Replacement Cache (brian cantrill talk)
Least Frequently Used
W-TinyLFU ^{1 2 3}
sieve ^{1 2 3}

Jun 23 '23 17:06 masklinn