rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Feature request: rate limit compaction triggered by periodic compaction seconds/ ttl only

Open zaidoon1 opened this issue 4 months ago • 5 comments

my db size is small but I do have a significant amount of deletes so I set a db ttl/periodic compaction seconds to make sure the tombstones are deleted every few hours. However, this caused lots of cpu usage spikes as reported in https://github.com/facebook/rocksdb/issues/12220 . I then rate limited compactions which solved this issue, HOWEVER, I thought write stalls can only be caused if flush is slow which is the main reason why I wanted to rate limit compaction but not flushes. It turns out, my understanding was incorrect, we do in fact stall if compaction is slow:

Screenshot 2024-04-15 at 12 34 02 AM

Given this information, what I would like to do instead is rate limit compaction triggered by db ttl/periodic compaction seconds since this compaction is mainly a clean up operation that doesn't need to happen immediately while at the same time, making sure that compaction triggered to make sure rocksdb "work fast" is not rate limited to avoid stalls.

Note that I'm using rocksdb in rust so i'm relying on the c apis to control rocksdb behaviour.

The alternative right now is to play around with the rate limiting so that I don't impact rocksdb write operations while at the same time making sure cpu doesn't spike significantly when periodic compaction seconds/db ttl is running which is trickier to balance.

zaidoon1 avatar Apr 15 '24 04:04 zaidoon1

@ajkr what do you think about this? Is there a quick fix that I can implement or will this be more involved?

zaidoon1 avatar Apr 19 '24 00:04 zaidoon1

What are your settings for compaction style, TTL/periodic seconds, and how is data deleted? I am thinking there might be other ways to help with the compaction spikes particularly if you're using leveled compaction style and RocksDB's deletion APIs (vs. other mechanisms like a compaction filter to delete data).

ajkr avatar Apr 19 '24 08:04 ajkr

when deleting I use rocksdb_writebatch_delete_cf, I don't have any compaction filters, etc.. TTL is set to 1800 seconds, and compaction style is whatever the default is. Here is my options file (I don't use the default cf so can ignore any options related to that):

OPTIONS.txt

zaidoon1 avatar Apr 19 '24 10:04 zaidoon1

Thanks for the info. I was wondering if you'd be interested in trying compaction_pri = kRoundRobin? Round-robin compaction style simply picks files within a level by cycling through them in order. Whereas the default compaction style (kMinOverlappingRatio) picks files according to some heuristic that can form hotspots (key ranges from which files are repeatedly picked) and coldspots (key ranges from which files are rarely or never picked).

I suspect kRoundRobin should work better with aggressive TTL settings. That's because round-robin picks the oldest data in the level to compact, saving work for TTL compaction later. In the best case (write rate is high enough that a full cycle of round-robin compaction completes in each level before any file's data age reaches the TTL), there would be no files compacted for TTL reason at all.

ajkr avatar Apr 22 '24 17:04 ajkr

got it, that's definitely good to know. I'll try out kRoundRobin and report back.

zaidoon1 avatar Apr 23 '24 02:04 zaidoon1