rocksdb compaction_readahead_size doesn't work when it is larger than max_sectors

compaction_readahead_size doesn't work when it is larger than max_sectors_kb

Open mdcallag opened this issue 8 months ago • 7 comments

From tests I run with RocksDB 8.7 and 8.8 what I see with rareq-sz from iostat is many small reads (block sized) from compaction when compaction_readahead_size is larger than /sys/block/$device/queue/max_sectors_kb. Reducing compaction_readahead_size resolves the problem. I assume it should be smaller than max_sectors_kb+rocksdb_block_size or some other fudge factor because the readahead requested can be slightly larger than compaction_readahead_size.

Nov 02 '23 20:11 mdcallag

On a small server I have at home running the db_bench benchmarks with one client thread I get ~70k Puts/s from overwriteandwait with RocsDB 8.7.2 and 8.8.0 when compaction_readahead_size=2MB and max_sectors_kb=512.

When I decrease compaction_readahead_size to 480KB so that it is smaller then max_sectors_kb then I get ~77k Puts/s

Nov 04 '23 02:11 mdcallag

On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of max_sectors_kb from the SSDs used by the database

Nov 04 '23 21:11 mdcallag

More results are here.

Nov 06 '23 21:11 mdcallag

On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of min_sectors_kb from the SSDs used by the database

@mdcallag ,hi Mark is min_sectors_kb a typo of max_sectors_kb? If not, could you figure out which path is min_sectors_kb stored in linux.

In your posted blog, I also see min_sectors_kb, which really confused me.

Nov 07 '23 02:11 wolfkdy

On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of min_sectors_kb from the SSDs used by the database

@mdcallag ,hi Mark is min_sectors_kb a typo of max_sectors_kb? If not, could you figure out which path is min_sectors_kb stored in linux.

In your posted blog, I also see min_sectors_kb, which really confused me.

Yes it is a typo. Thank you for finding it.

Nov 14 '23 18:11 mdcallag

More results are here to show that throughput for overwrite drops by ~5% in 8.7 and probably in 8.6. From iostat I see that the average read size is much smaller after 8.5.

Jan 04 '24 20:01 mdcallag

And more results are here to show the impact of compaction_readhead_size set to the default (2MB), 1MB and 512KB.

Jan 08 '24 22:01 mdcallag

rocksdb rocksdb copied to clipboard

compaction_readahead_size doesn't work when it is larger than max_sectors_kb

rocksdb
rocksdb copied to clipboard