rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Implement a non-volatile SecondaryCache

Open anand1976 opened this issue 3 years ago • 15 comments

RocksDB added support for a SecondaryCache in #8271, #8191 and #8312. The SecondaryCache is an additional tier of caching below the block cache. It can be used to provide a non-volatile cache tier on local flash or NVM/SCM that can complement the DRAM block cache. More details about the design of SecondaryCache can be found here.

We are looking for a community contribution of SecondaryCache implementations, which would make this feature usable by the broader RocksDB userbase.

anand1976 avatar May 28 '21 17:05 anand1976

Would like to pick this up.

kriti-sc avatar Jul 24 '21 15:07 kriti-sc

@kriti-sc That sounds great! Thanks for volunteering. Let me know what you have in mind. I'll be happy to discuss and help out in any way.

anand1976 avatar Jul 27 '21 18:07 anand1976

If I have understood correctly, we are implementing a persistent memory cache. This cache will be controlled by the existing RocksDB APIs [reference PR].

I found this library by Intel to program persistent memory [reference]. It provides an abstraction over complexities such as error handling, data consistency & durability.

@anand1976 Please let me know your thoughts.

kriti-sc avatar Aug 03 '21 22:08 kriti-sc

If I have understood correctly, we are implementing a persistent memory cache. This cache will be controlled by the existing RocksDB APIs [reference PR].

Actually, it is not only for persistent memory cache. The secondary cache can based on HDD, SSD or NVM or even remote DRAM.

zhichao-cao avatar Aug 04 '21 23:08 zhichao-cao

Thanks for your interest. I think long term a secondary cache based on persistent memory would be interesting. The hardware is not widely available or used yet, so it's still early days.

In the near term, there's a need for a ssd/flash based secondary cache, for use cases where the database is on some kind of remote or cloud based storage and is accessed by a server with some direct attached flash.

Let me know if you'd be interested in implementing it.

Thanks,, Anand

On Tue, Aug 3, 2021, 3:29 PM Kriti Kathuria @.***> wrote:

If I have understood correctly, we are implementing a persistent memory cache. This cache will be controlled by the existing RocksDB APIs [reference PR https://github.com/facebook/rocksdb/pull/8113/files].

I found this library https://pmem.io/vmemcache/manpages/master/vmemcache.3.html by Intel to program persistent memory [reference https://pmem.io/pmdk/]. It provides an abstraction over complexities such as error handling, data consistency & durability.

@anand1976 https://github.com/anand1976 Please let me know your thoughts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebook/rocksdb/issues/8347#issuecomment-892208010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAWX6TURW2CPSATK7QMRJTT3BUTLANCNFSM45XCMLJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

anand1976 avatar Aug 09 '21 06:08 anand1976

Hi, All. Does this feature need only one community contributor? I am also glad to make contribution to this feature. :)

zaorangyang avatar Aug 10 '21 10:08 zaorangyang

Hi, All. Does this feature need only one community contributor? I am also glad to make contribution to this feature. :)

The SecondaryCache is configurable and there can be multiple independent or related implementations. For example, there could be one based on PMEM and another based on SSD/Flash, and yet another stored in the cloud.

So my suggestion is to coordinate with anyone else who may be implementing a cache so that the same "SecondaryCache" is not implemented twice, but there is nothing that prevents multiple independent implementations!

mrambacher avatar Aug 10 '21 13:08 mrambacher

@anand1976 That aligns with what I have been thinking. I am interested in implementing this and will start with SSDs.

Thanks for your interest. I think long term a secondary cache based on persistent memory would be interesting. The hardware is not widely available or used yet, so it's still early days. In the near term, there's a need for a ssd/flash based secondary cache, for use cases where the database is on some kind of remote or cloud based storage and is accessed by a server with some direct attached flash. Let me know if you'd be interested in implementing it. Thanks,, Anand On Tue, Aug 3, 2021, 3:29 PM Kriti Kathuria @.***> wrote: If I have understood correctly, we are implementing a persistent memory cache. This cache will be controlled by the existing RocksDB APIs [reference PR https://github.com/facebook/rocksdb/pull/8113/files]. I found this library https://pmem.io/vmemcache/manpages/master/vmemcache.3.html by Intel to program persistent memory [reference https://pmem.io/pmdk/]. It provides an abstraction over complexities such as error handling, data consistency & durability. @anand1976 https://github.com/anand1976 Please let me know your thoughts. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8347 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAWX6TURW2CPSATK7QMRJTT3BUTLANCNFSM45XCMLJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

kriti-sc avatar Aug 18 '21 22:08 kriti-sc

I am interested in contributing. Our scenario (implementing NoSql using RocksDB as storage engine) is to reduce DRAM size and cost by placing larger part of block cache in Intel Optane PMEM or just regular NVME m2 (block based address pattern). So Block cache can be viewed as having hot (DRAM) and warm (PMEM) and cold (NVME m2) layer with customizable caching eviction strategy. Our read/write latency SLA is in the ms. So increase read latency from <10 ns to hundreds of ns or even at microsecond level shouldn't impact our latency SLA, but can save costs.

kennthhz-zz avatar Oct 10 '21 19:10 kennthhz-zz

Hi, We are the Intel RPC Optane team. Now, we're building a secondary cache plugin implementation (along with other fs layer plugins) based on Optane Persistent Memory hardware and will be open-sourced soon. Hope we can contribute to the RocksDB community!

chenyou-intel avatar Feb 11 '22 06:02 chenyou-intel

Hi, We are the Intel RPC Optane team. Now, we're building a secondary cache plugin implementation (along with other fs layer plugins) based on Optane Persistent Memory hardware and will be open-sourced soon. Hope we can contribute to the RocksDB community!

We have open sourced our persistent memory based secondary cache here !

chenyou-intel avatar May 04 '22 04:05 chenyou-intel

@chenyou-intel Thanks for the contribution! I'll take a look in the next week or so.

anand1976 avatar May 31 '22 18:05 anand1976

@chenyou-intel Thanks for the contribution! I'll take a look in the next week or so.

Sure, you could also find a WIP PR to integrate PMem optimized CacheLib with SecondaryCache in https://github.com/pmem/pmem-rocksdb-plugin/pull/6

chenyou-intel avatar Jun 03 '22 04:06 chenyou-intel

Hi @anand1976 , we are working on implementing a secondary cache which puts cached data on SSD, and we are considering to use CacheLib as well. Since FB already has a plug-in implementation by using CacheLib, would it be possible to open-source this part as well?

sherriiiliu avatar Jul 28 '22 17:07 sherriiiliu

agree w/ sherriiiliu - open source the cachelib integration :)

journaux avatar Aug 31 '22 09:08 journaux

Meta has open sourced a Cachelib SecondaryCache implementation. I added some code to allow it to be enabled as a RocksDB Plugin. Here is the PR. Note that the code resides in CacheLib and not in RocksDB repo.

This PR was meant to be the first stage of enablement, and I hope others can take it further.

mrambacher avatar Dec 23 '22 17:12 mrambacher

Inspired by mrambacher's PR, I extracted CachelibWrapper.cpp and CachelibWrapper.h of facebook's open sourced implementation of CacheLib-based SecondaryCache with internally used code removed so that it can compile (related issue: https://github.com/facebook/CacheLib/issues/278). The dependency relationship is quite complex, so I packed the two files into a standalone CMake repo and resolved dependencies in the CMakeLists.txt in it. URL of the CMake repo: https://github.com/seekstar/RocksCachelibWrapper

I hope it will be helpful until a publicly recognized implementation becomes available.

seekstar avatar Nov 24 '23 13:11 seekstar