Improve universal compaction sorted-run trigger
Summary: Universal compaction currently uses level0_file_num_compaction_trigger for two purposes:
- the trigger for checking there is any compaction to do, and
- the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option.
This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big causes worse read performance: more sorted runs. This PR introduce an option CompactionOptionsUniversal::max_read_amp for only the second purpose: to specify
the hard limit on the number of sorted runs.
For backward compatibility, max_read_amp = -1 by default, which means to fallback to the current behavior.
When max_read_amp > 0,level0_file_num_compaction_trigger will only serve as a trigger to find potential compaction.
When max_read_amp = 0, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in UniversalCompactionBuilder::PickCompaction().
Alternatively, users now can configure max_read_amp to a very big value and keep level0_file_num_compaction_trigger small. This will allow size_ratio and max_size_amplification_percent to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum.
Test plan:
- new unit test
- existing unit test for default behavior
- updated crash tests to continuously test the new option
- benchmark:
- Create a DB that is roughly 24GB in the last level. When
max_read_amp = 0, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs. - We then run fillrandom to ingest another 24GB data to compare write amp.
- case 1: small level0 trigger:
level0_file_num_compaction_trigger=5, max_read_amp=-1- write-amp: 4.8
- case 2: auto-tune:
level0_file_num_compaction_trigger=5, max_read_amp=0- write-amp: 3.5
- case 3: hard-code a good value for trigger:
level0_file_num_compaction_trigger=9- write-amp: 2.8
- case 1: small level0 trigger:
- Create a DB that is roughly 24GB in the last level. When
Case 1:
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 0/0 0.00 KB 1.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 160.7 144.12 112.97 108 1.334 0 0 0.0 0.0
L44 0/0 0.00 KB 0.0 0.4 0.4 0.0 0.4 0.4 0.0 1.0 187.9 186.8 2.28 2.03 1 2.283 3680K 17K 0.0 0.0
L45 5/0 1.03 GB 0.0 36.9 10.9 26.1 36.6 10.6 0.0 3.4 194.1 192.5 194.95 181.28 40 4.874 325M 2394K 0.0 0.0
L46 16/0 3.83 GB 0.0 16.7 10.3 6.3 16.3 10.0 0.0 1.6 187.6 183.5 91.03 84.21 16 5.689 146M 3117K 0.0 0.0
L47 19/0 4.68 GB 0.0 15.4 10.5 4.9 14.7 9.8 0.0 1.4 192.4 183.9 82.02 77.52 8 10.252 135M 5920K 0.0 0.0
L48 38/0 9.42 GB 0.0 19.6 11.7 7.9 17.3 9.4 0.0 1.5 196.0 172.9 102.37 98.65 4 25.592 172M 20M 0.0 0.0
L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0
Sum 169/0 41.67 GB 0.0 89.1 43.9 45.1 108.0 62.9 0.0 4.8 147.8 179.3 616.77 556.65 177 3.485 783M 31M 0.0 0.0
Int 0/0 0.00 KB 0.0 27.6 11.9 15.7 31.7 16.0 0.0 6.6 155.6 178.5 181.64 158.02 42 4.325 242M 6424K 0.0 0.0
Case 2:
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 214.45 MB 1.2 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 160.6 144.23 112.59 108 1.335 0 0 0.0 0.0
L44 0/0 0.00 KB 0.0 1.7 1.7 0.0 1.7 1.7 0.0 1.0 196.3 195.1 8.74 8.24 4 2.186 14M 67K 0.0 0.0
L45 7/0 1.62 GB 0.0 8.8 6.3 2.5 8.7 6.2 0.0 1.4 188.5 186.3 47.64 44.91 14 3.403 77M 834K 0.0 0.0
L46 13/0 3.12 GB 0.0 15.4 9.6 5.8 15.0 9.3 0.0 1.6 182.7 178.5 86.15 80.73 16 5.385 135M 2963K 0.0 0.0
L47 19/0 4.68 GB 0.0 15.5 10.6 4.9 14.7 9.8 0.0 1.4 179.9 171.3 88.02 81.23 8 11.003 136M 6351K 0.0 0.0
L48 38/0 9.42 GB 0.0 19.6 11.8 7.9 17.3 9.4 0.0 1.5 177.9 156.5 113.10 100.36 4 28.276 172M 20M 0.0 0.0
L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0
Sum 169/0 41.76 GB 0.0 60.9 39.9 21.0 80.0 59.0 0.0 3.5 127.9 167.9 487.89 428.05 154 3.168 535M 30M 0.0 0.0
Int 0/0 0.00 KB 0.0 16.3 10.4 5.9 20.2 14.3 0.0 4.4 143.1 177.5 116.60 106.83 34 3.429 143M 5971K 0.0 0.0
Case 3:
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 0/0 0.00 KB 0.7 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 166.1 139.44 108.97 108 1.291 0 0 0.0 0.0
L42 0/0 0.00 KB 0.0 1.7 1.7 0.0 1.7 1.7 0.0 1.0 193.8 192.7 8.85 7.98 4 2.213 14M 67K 0.0 0.0
L43 0/0 0.00 KB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 182.9 180.9 14.07 12.93 4 3.518 22M 203K 0.0 0.0
L44 4/0 844.39 MB 0.0 4.2 4.2 0.0 4.1 4.1 0.0 1.0 180.5 177.7 23.76 22.28 5 4.751 36M 505K 0.0 0.0
L45 13/0 3.12 GB 0.0 7.5 6.5 1.0 7.2 6.2 0.0 1.1 191.0 184.1 40.19 38.22 5 8.038 65M 2281K 0.0 0.0
L46 17/0 4.18 GB 0.0 8.3 7.1 1.2 7.9 6.6 0.0 1.1 188.3 178.1 45.16 43.06 4 11.289 73M 3845K 0.0 0.0
L47 22/0 5.34 GB 0.0 8.9 7.5 1.4 8.2 6.8 0.0 1.1 182.5 168.1 49.86 46.32 3 16.619 78M 6099K 0.0 0.0
L48 27/0 6.58 GB 0.0 9.2 7.6 1.6 8.2 6.6 0.0 1.1 181.7 161.0 52.13 49.39 2 26.066 81M 9215K 0.0 0.0
L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0
Sum 174/0 42.74 GB 0.0 42.3 37.0 5.3 62.4 57.1 0.0 2.8 116.0 171.0 373.46 329.14 135 2.766 372M 22M 0.0 0.0
Int 0/0 0.00 KB 0.0 11.6 9.4 2.3 15.0 12.7 0.0 3.8 139.2 179.8 85.39 78.53 26 3.284 102M 5141K 0.0 0.0
setup:
./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456
benchmark:
./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@cbi42 has updated the pull request. You must reimport the pull request before landing.
Thanks for the review.
What about even smaller, like
level0_file_num_compaction_trigger=1?
Good point, I added the benchmark result in PR summary. The WA is just slightly worst.
Actually you may also want to adjust the
max_read_amp=0calculation for thekCompactionStopStyleSimilarSizecase, or disallow it for now.
Thanks for catching this. For kCompactionStopStyleSimilarSize, I updated the code to let max_read_amp=0 fallback to old behavior (use level0_file_num_compaction_trigger).
For something structurally different, maybe
kCompactionStopStyleSimilarSizeandmin_merge_width=3
Besides min_merge_width=3, I set max_read_amp=9 and level0_file_num_compaction_trigger=3, the same benchmark gives WA 3.9.
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 0/0 0.00 KB 3.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.1 142.04 110.53 108 1.315 0 0 0.0 0.0
L40 0/0 0.00 KB 0.0 0.6 0.6 0.0 0.6 0.6 0.0 1.0 197.7 195.5 3.26 2.99 1 3.255 5519K 50K 0.0 0.0
L41 18/0 4.35 GB 0.0 25.0 5.4 19.5 24.5 5.0 0.0 4.5 195.9 192.2 130.63 121.00 11 11.876 219M 4019K 0.0 0.0
L42 3/0 636.30 MB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 204.3 202.0 12.60 12.37 4 3.150 22M 202K 0.0 0.0
L43 8/0 1.81 GB 0.0 4.4 3.8 0.6 4.3 3.7 0.0 1.1 206.4 202.6 21.72 21.29 5 4.345 38M 648K 0.0 0.0
L44 8/0 1.81 GB 0.0 6.9 5.6 1.2 6.7 5.5 0.0 1.2 204.5 200.4 34.39 33.89 7 4.914 60M 1142K 0.0 0.0
L45 5/0 1.03 GB 0.0 5.4 4.8 0.6 5.3 4.7 0.0 1.1 202.1 198.3 27.49 27.11 6 4.582 47M 816K 0.0 0.0
L46 20/0 5.01 GB 0.0 11.7 8.6 3.1 11.1 8.1 0.0 1.3 206.8 196.9 57.85 56.96 7 8.264 102M 4811K 0.0 0.0
L47 5/0 1.03 GB 0.0 4.2 3.5 0.6 4.1 3.5 0.0 1.2 204.6 200.3 20.86 20.67 4 5.216 36M 713K 0.0 0.0
L48 20/0 5.01 GB 0.0 7.9 5.5 2.4 7.4 5.0 0.0 1.4 206.5 193.9 39.33 38.78 3 13.111 69M 4217K 0.0 0.0
L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0
Sum 178/0 43.38 GB 0.0 68.6 40.4 28.1 89.3 61.1 0.0 3.9 143.3 186.5 490.18 445.58 156 3.142 603M 16M 0.0 0.0
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Thanks for the review.
What about even smaller, like
level0_file_num_compaction_trigger=1?Good point, I added the benchmark result in PR summary. The WA is just slightly worst.
Actually you may also want to adjust the
max_read_amp=0calculation for thekCompactionStopStyleSimilarSizecase, or disallow it for now.Thanks for catching this. For kCompactionStopStyleSimilarSize, I updated the code to let max_read_amp=0 fallback to old behavior (use level0_file_num_compaction_trigger).
For something structurally different, maybe
kCompactionStopStyleSimilarSizeandmin_merge_width=3Besides min_merge_width=3, I set max_read_amp=9 and level0_file_num_compaction_trigger=3, the same benchmark gives WA 3.9.
** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 3.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.1 142.04 110.53 108 1.315 0 0 0.0 0.0 L40 0/0 0.00 KB 0.0 0.6 0.6 0.0 0.6 0.6 0.0 1.0 197.7 195.5 3.26 2.99 1 3.255 5519K 50K 0.0 0.0 L41 18/0 4.35 GB 0.0 25.0 5.4 19.5 24.5 5.0 0.0 4.5 195.9 192.2 130.63 121.00 11 11.876 219M 4019K 0.0 0.0 L42 3/0 636.30 MB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 204.3 202.0 12.60 12.37 4 3.150 22M 202K 0.0 0.0 L43 8/0 1.81 GB 0.0 4.4 3.8 0.6 4.3 3.7 0.0 1.1 206.4 202.6 21.72 21.29 5 4.345 38M 648K 0.0 0.0 L44 8/0 1.81 GB 0.0 6.9 5.6 1.2 6.7 5.5 0.0 1.2 204.5 200.4 34.39 33.89 7 4.914 60M 1142K 0.0 0.0 L45 5/0 1.03 GB 0.0 5.4 4.8 0.6 5.3 4.7 0.0 1.1 202.1 198.3 27.49 27.11 6 4.582 47M 816K 0.0 0.0 L46 20/0 5.01 GB 0.0 11.7 8.6 3.1 11.1 8.1 0.0 1.3 206.8 196.9 57.85 56.96 7 8.264 102M 4811K 0.0 0.0 L47 5/0 1.03 GB 0.0 4.2 3.5 0.6 4.1 3.5 0.0 1.2 204.6 200.3 20.86 20.67 4 5.216 36M 713K 0.0 0.0 L48 20/0 5.01 GB 0.0 7.9 5.5 2.4 7.4 5.0 0.0 1.4 206.5 193.9 39.33 38.78 3 13.111 69M 4217K 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 178/0 43.38 GB 0.0 68.6 40.4 28.1 89.3 61.1 0.0 3.9 143.3 186.5 490.18 445.58 156 3.142 603M 16M 0.0 0.0
I was worried the sizes would be chaotic like that. T189373775 could help as well as a generous value for size ratio. Anyways that's a separate problem.
@cbi42 has updated the pull request. You must reimport the pull request before landing.
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@cbi42 merged this pull request in facebook/rocksdb@fecb10c2fa1501fd71a120793e1913a1ac7407ea.