rocksdb
rocksdb copied to clipboard
compaction_readahead_size doesn't work when it is larger than max_sectors_kb
From tests I run with RocksDB 8.7 and 8.8 what I see with rareq-sz from iostat is many small reads (block sized) from compaction when compaction_readahead_size is larger than /sys/block/$device/queue/max_sectors_kb. Reducing compaction_readahead_size resolves the problem. I assume it should be smaller than max_sectors_kb+rocksdb_block_size or some other fudge factor because the readahead requested can be slightly larger than compaction_readahead_size.
On a small server I have at home running the db_bench benchmarks with one client thread I get ~70k Puts/s from overwriteandwait with RocsDB 8.7.2 and 8.8.0 when compaction_readahead_size=2MB and max_sectors_kb=512.
When I decrease compaction_readahead_size to 480KB so that it is smaller then max_sectors_kb then I get ~77k Puts/s
On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of max_sectors_kb from the SSDs used by the database
More results are here.
On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of min_sectors_kb from the SSDs used by the database
@mdcallag ,hi Mark
is min_sectors_kb
a typo of max_sectors_kb
? If not, could you figure out which path is min_sectors_kb
stored in linux.
In your posted blog, I also see min_sectors_kb
, which really confused me.
On a large server the insert rate from overwrite improves by 13.4% when I reduce compaction_readahead_size from 2MB to 1MB so that it is smaller than the value of min_sectors_kb from the SSDs used by the database
@mdcallag ,hi Mark is
min_sectors_kb
a typo ofmax_sectors_kb
? If not, could you figure out which path ismin_sectors_kb
stored in linux.In your posted blog, I also see
min_sectors_kb
, which really confused me.
Yes it is a typo. Thank you for finding it.
More results are here to show that throughput for overwrite drops by ~5% in 8.7 and probably in 8.6. From iostat I see that the average read size is much smaller after 8.5.
And more results are here to show the impact of compaction_readhead_size set to the default (2MB), 1MB and 512KB.