dice icon indicating copy to clipboard operation
dice copied to clipboard

feat: added LFU caching eviction with constant time operation

Open mrchocha opened this issue 3 years ago • 12 comments

context

  • Added LFU caching eviction using doubly-linked lists and hash table in constant time operation

Issue Link: https://github.com/DiceDB/dice/issues/10 Reference Link: https://arpitbhayani.me/blogs/lfu

screenshots

  1. SET operation set_operation

  2. GET operation and Re-Balancing get_operation_and_rebalancing

  3. DEL operation delete_operation

mrchocha avatar Oct 19 '22 12:10 mrchocha

Looks good enough, but it would be better if some degree of concurrency safety is there.

rohanverma94 avatar Oct 19 '22 14:10 rohanverma94

Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.

arpitbbhayani avatar Oct 19 '22 16:10 arpitbbhayani

Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.

arpitbbhayani avatar Oct 19 '22 16:10 arpitbbhayani

@arpitbbhayani I agree, this implementation might be good for a cache(small set of values) but not for an in-mem database(a large set of values). As it increases the data movements(allocations) as well as it will pressurize the GC and "stop-the-world" problems with the CPU would arise. Have you found a suitable approximation algorithm for LFU?

An approximate LFU algorithm is simple enough, please look at this @mrchocha https://tokers.github.io/posts/lru-and-lfu-in-redis-memory-eviction/

rohanverma94 avatar Oct 19 '22 16:10 rohanverma94

Our database is in-memory and this implementation is very heavy on memory requiring a lot of auxiliary data structures. We should try to reduce that even if we don't meet the correctness.

Hey, I know it will add some new data structures. but not sure how can I reduce it. any suggestions? CC: @arpitbbhayani @rohanverma94

mrchocha avatar Oct 20 '22 02:10 mrchocha

Plus this is not CPU cache efficient. Keeping another copies of keys is also inefficient.

Ya, I thought it would be good if I can create it in plug and play manner. otherwise, I had to change the store object, and then every object will have a couple of extra variables that might not be used for other eviation policies.

mrchocha avatar Oct 20 '22 03:10 mrchocha

Read approximate counting - Morris counter and piggy back on LRU field we already have in our object struct.

Research more on how it can be done while being more space efficient. Coding the solution is the easiest thing there.

I would highly recommend you explore a bunch of approaches and papers to come up with the best solution.

Key Pointers

  • Eviction can be approximated
  • Solution should be extremely space efficient

Happy to discuss.

arpitbbhayani avatar Oct 20 '22 03:10 arpitbbhayani

Ya got some ideas from the articles like how it can be memory efficient. correct me if I am wrong. we need to use 2 vars.

  1. decay time unit8
  2. counter unit8

here counter will be logarithmic/Morris and it's 8 bit so at max, we will have 256 nodes in the frequency list and need to decrease the counter after last access time - current time > decay time

since we have ePoolSizeMax int = 16 we can crate the configuration while creating eviction pool, such that the frequency list will discard the nodes from the tail, whenever it reaches the limit.

like Redis uses this formula for counter 1/(old_value*lfu_log_factor+1)

mrchocha avatar Oct 20 '22 04:10 mrchocha

@mrchocha there are a lot of gaps in your understanding. Would recommend spending some time understanding the approach well. Also, this is not the only way to do it. Research more on this.

Would really recommend you read more papers and seek other approaches. It is okay if it takes time. Let's find a novel approach that would make Dice DB super-efficient.

arpitbbhayani avatar Oct 20 '22 05:10 arpitbbhayani

sure

mrchocha avatar Oct 20 '22 05:10 mrchocha

Let's keep dumping approached in #44 and use that space to converge on a solid approach.

arpitbbhayani avatar Oct 20 '22 05:10 arpitbbhayani

@mrchocha Thanks for the contribution. There are conflicts in this PR atm. The code base has moved forward lot. Please rebase and re-submit if you are still keen to fix this. We will be cleaning up old PR's if they are inactive.

yashs360 avatar Jul 14 '24 10:07 yashs360