feat: added LFU caching eviction with constant time operation
context
- Added LFU caching eviction using doubly-linked lists and hash table in constant time operation
Issue Link: https://github.com/DiceDB/dice/issues/10 Reference Link: https://arpitbhayani.me/blogs/lfu
screenshots
-
SET operation

-
GET operation and Re-Balancing

-
DEL operation

Looks good enough, but it would be better if some degree of concurrency safety is there.
Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.
Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.
@arpitbbhayani I agree, this implementation might be good for a cache(small set of values) but not for an in-mem database(a large set of values). As it increases the data movements(allocations) as well as it will pressurize the GC and "stop-the-world" problems with the CPU would arise. Have you found a suitable approximation algorithm for LFU?
An approximate LFU algorithm is simple enough, please look at this @mrchocha https://tokers.github.io/posts/lru-and-lfu-in-redis-memory-eviction/
Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.
Hey, I know it will add some new data structures. but not sure how can I reduce it. any suggestions? CC: @arpitbbhayani @rohanverma94
Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.
Ya, I thought it would be good if I can create it in plug and play manner. otherwise, I had to change the store object, and then every object will have a couple of extra variables that might not be used for other eviation policies.
Read approximate counting - Morris counter and piggy back on LRU field we already have in our object struct.
Research more on how it can be done while being more space efficient. Coding the solution is the easiest thing there.
I would highly recommend you explore a bunch of approaches and papers to come up with the best solution.
Key Pointers
- Eviction can be approximated
- Solution should be extremely space efficient
Happy to discuss.
Ya got some ideas from the articles like how it can be memory efficient. correct me if I am wrong. we need to use 2 vars.
- decay time
unit8 - counter
unit8
here counter will be logarithmic/Morris and it's 8 bit so at max, we will have 256 nodes in the frequency list and need to decrease the counter after last access time - current time > decay time
since we have ePoolSizeMax int = 16 we can crate the configuration while creating eviction pool, such that the frequency list will discard the nodes from the tail, whenever it reaches the limit.
like Redis uses this formula for counter
1/(old_value*lfu_log_factor+1)
@mrchocha there are a lot of gaps in your understanding. Would recommend spending some time understanding the approach well. Also, this is not the only way to do it. Research more on this.
Would really recommend you read more papers and seek other approaches. It is okay if it takes time. Let's find a novel approach that would make Dice DB super-efficient.
sure
Let's keep dumping approached in #44 and use that space to converge on a solid approach.
@mrchocha Thanks for the contribution. There are conflicts in this PR atm. The code base has moved forward lot. Please rebase and re-submit if you are still keen to fix this. We will be cleaning up old PR's if they are inactive.