rocksdb What's the memory and CPU overhead of a single RocksDB instance?

I'm working on a Kafka streams application that leverages RocksDB for storing state. Each application instance is assigned around 50 partitions, which means that it runs 50 instances of RocksDB. What is the added overhead of each of these instances?

Aug 27 '22 16:08 omeraha

https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB has more detail on how each RocksDB uses memory. Most of the memory can be accounted to block cache, and the cache can be shared between different RocksDB instances. You might also be interested in Write Buffer Manager which controls memtable memory usage over potentially multiple DBs.

Aug 27 '22 23:08 cbi42

Thanks @cbi42, but I was more interested in the minimal footprint (both memory and CPU) of a single instance, especially because I'm running a large number of RocksDB instances per Kafka streams application instance. For example, what's the minimal number of threads (for compactions or any other use) a single RocksDB instance requires. In addition, would you recommend using shared resources (caches, thread pools) in this case?

Aug 28 '22 06:08 omeraha

I think you'll need at least a background thread for flush and compaction. I don't know much about whether using shared resources is better, so others feel free to chime in. My guess is that with threadpool, you could have less threads (<50 when you have 50 instances of RocksDB), which might be more efficient with resources. It also depends on if you want to limit footprint per instance, or just want to enforce a global limit.

Aug 29 '22 17:08 cbi42

@cbi42 Enforcing a global limit would be good enough. I've tried doing so based on this page, with the following settings (I'm running RocksDB version 6.29.4):

    val cache = new LRUCache(TOTAL_OFF_HEAP_MEMORY, -1, false) // 66 MB
    val writeBufferManager = new WriteBufferManager(TOTAL_MEMTABLE_MEMORY, cache) // MAX_WRITE_BUFFER_NUM * MAX_WRITE_BUFFER_SIZE

    tableConfig.setCacheIndexAndFilterBlocks(true)
    tableConfig.setPinL0FilterAndIndexBlocksInCache(true)
    tableConfig.setPinTopLevelIndexAndFilter(true)
    tableConfig.setBlockCache(cache)
    tableConfig.setBlockSize(MAX_BLOCK_SIZE) // 256KB

    options.setWriteBufferManager(writeBufferManager)
    options.setMaxWriteBufferNumber(MAX_WRITE_BUFFER_NUM) // 2
    options.setWriteBufferSize(MAX_WRITE_BUFFER_SIZE) // 64MB
    options.setTableFormatConfig(tableConfig)

I could easily see the effects of the configuration (the Write manager part had the most prominent effect), but could not tell whether it's a good change or not. The overall memory usage of the application did not change, but I did see that the following metrics increased significantly:

Block cache filter/index hit ratio - this makes sense, as both are now stored in the block cache
Estimated number of keys sky rocketed - not sure why, maybe because more memory was allocated for memtables?
Memtable size increased from ~500MB to 2.5-3 GB
Block cache/pined usage increased - again, makes sense.

I'm not able to tell wether these changes (and the actual values I've set) put a limit on a single instance or all of the instances (per what's written here), and also, whether this changes are a step in a good direction or not (as the overall memory usage of the application did not change)

Aug 30 '22 06:08 omeraha