sharedhashfile
sharedhashfile copied to clipboard
Support for LRU with expire
Thanks for the high performance sharedhashfile. Any plans to support LRU with expire similar to Nginx shared cache ?
Hi Rohit and thanks for the request.
Regarding expire:
sharedhashfile has been used to passive / lazy expire (expire upon read) keys, and this can be achieved by the caller by embedding some kind of timestamp in the value.
Regarding LRU:
This could be achieved by the caller via embedding a double linked list in the value, so that all keys are always in chronological order. Then it'd be easy to find the least recently used key and delete it. But this algorithm would be bad for size and performance. All values would get larger. And there would be likely be a locking performance hit, although perhaps this could be eliminated by limiting the LRU double linked list to an individual SHF shard (containing no locking internally).
Another strategy to achieve something similar but faster purely in caller code is having a separate process -- accessing sharedhashfile in parallel -- which reads through all keys in consecutive order in memory (much faster than regular key access, and works in parallel to regular processes accessing keys) and actively expires keys. I have used a technique like this before to actively expire keys and to persist a set of keys held purely in RAM, to SSD efficiently without compromising in memory key access performance of the other processes.
How would this caller implemented technique work with LRU? As the keys are consecutively accessed in memory, the continually (but likely throttled) scanning process adds the oldest e.g. n% of keys to a list of oldest keys. Note: Only the shorter key UID is added to the list, not the key itself. This list of oldest keys can be used later to purge older keys on demand. However, with this algorithm the absolutely least recently used key is not necessarily purged, only a similarly old least recently used key is purged. This may or may not fit in with your use case.
Hope this helps and looking forward to any questions!
This could be achieved by the caller via embedding a double linked list in the value, so that all keys are always in chronological order. Then it'd be easy to find the least recently used key and delete it. But this algorithm would be bad for size and performance. All values would get larger. And there would be likely be a locking performance hit, although perhaps this could be eliminated by limiting the LRU double linked list to an individual SHF shard (containing no locking internally).
actually this is a "better" way to do so. not too sure how the SHF shard works though.
possible to introduce this with golang and i dont mind the lock though.
not too sure how the SHF shard works though
FYI there is some explanation of the internals here [1] including the sharding.
[1] https://github.com/simonhf/sharedhashfile/blob/master/src/shf.h#L95