rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Improve universal compaction sorted-run trigger

Open cbi42 opened this issue 1 year ago • 1 comments

Summary: Universal compaction currently uses level0_file_num_compaction_trigger for two purposes:

  1. the trigger for checking there is any compaction to do, and
  2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option.

This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big causes worse read performance: more sorted runs. This PR introduce an option CompactionOptionsUniversal::max_read_amp for only the second purpose: to specify the hard limit on the number of sorted runs.

For backward compatibility, max_read_amp = -1 by default, which means to fallback to the current behavior. When max_read_amp > 0,level0_file_num_compaction_trigger will only serve as a trigger to find potential compaction. When max_read_amp = 0, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in UniversalCompactionBuilder::PickCompaction(). Alternatively, users now can configure max_read_amp to a very big value and keep level0_file_num_compaction_trigger small. This will allow size_ratio and max_size_amplification_percent to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum.

Test plan:

  • new unit test
  • existing unit test for default behavior
  • updated crash tests to continuously test the new option
  • benchmark:
    • Create a DB that is roughly 24GB in the last level. When max_read_amp = 0, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs.
    • We then run fillrandom to ingest another 24GB data to compare write amp.
      • case 1: small level0 trigger: level0_file_num_compaction_trigger=5, max_read_amp=-1
        • write-amp: 4.8
      • case 2: auto-tune: level0_file_num_compaction_trigger=5, max_read_amp=0
        • write-amp: 3.5
      • case 3: hard-code a good value for trigger: level0_file_num_compaction_trigger=9
        • write-amp: 2.8
Case 1:  
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   1.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    160.7    144.12            112.97       108    1.334       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      0.4     0.4      0.0       0.4      0.4       0.0   1.0    187.9    186.8      2.28              2.03         1    2.283   3680K    17K       0.0       0.0
 L45      5/0    1.03 GB   0.0     36.9    10.9     26.1      36.6     10.6       0.0   3.4    194.1    192.5    194.95            181.28        40    4.874    325M  2394K       0.0       0.0
 L46     16/0    3.83 GB   0.0     16.7    10.3      6.3      16.3     10.0       0.0   1.6    187.6    183.5     91.03             84.21        16    5.689    146M  3117K       0.0       0.0
 L47     19/0    4.68 GB   0.0     15.4    10.5      4.9      14.7      9.8       0.0   1.4    192.4    183.9     82.02             77.52         8   10.252    135M  5920K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.7      7.9      17.3      9.4       0.0   1.5    196.0    172.9    102.37             98.65         4   25.592    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.67 GB   0.0     89.1    43.9     45.1     108.0     62.9       0.0   4.8    147.8    179.3    616.77            556.65       177    3.485    783M    31M       0.0       0.0
 Int      0/0    0.00 KB   0.0     27.6    11.9     15.7      31.7     16.0       0.0   6.6    155.6    178.5    181.64            158.02        42    4.325    242M  6424K       0.0       0.0


Case 2: 
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   214.45 MB   1.2      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    160.6    144.23            112.59       108    1.335       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      1.7     1.7      0.0       1.7      1.7       0.0   1.0    196.3    195.1      8.74              8.24         4    2.186     14M    67K       0.0       0.0
 L45      7/0    1.62 GB   0.0      8.8     6.3      2.5       8.7      6.2       0.0   1.4    188.5    186.3     47.64             44.91        14    3.403     77M   834K       0.0       0.0
 L46     13/0    3.12 GB   0.0     15.4     9.6      5.8      15.0      9.3       0.0   1.6    182.7    178.5     86.15             80.73        16    5.385    135M  2963K       0.0       0.0
 L47     19/0    4.68 GB   0.0     15.5    10.6      4.9      14.7      9.8       0.0   1.4    179.9    171.3     88.02             81.23         8   11.003    136M  6351K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.8      7.9      17.3      9.4       0.0   1.5    177.9    156.5    113.10            100.36         4   28.276    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.76 GB   0.0     60.9    39.9     21.0      80.0     59.0       0.0   3.5    127.9    167.9    487.89            428.05       154    3.168    535M    30M       0.0       0.0
 Int      0/0    0.00 KB   0.0     16.3    10.4      5.9      20.2     14.3       0.0   4.4    143.1    177.5    116.60            106.83        34    3.429    143M  5971K       0.0       0.0


Case 3:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.7      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    166.1    139.44            108.97       108    1.291       0      0       0.0       0.0
 L42      0/0    0.00 KB   0.0      1.7     1.7      0.0       1.7      1.7       0.0   1.0    193.8    192.7      8.85              7.98         4    2.213     14M    67K       0.0       0.0
 L43      0/0    0.00 KB   0.0      2.5     2.5      0.0       2.5      2.5       0.0   1.0    182.9    180.9     14.07             12.93         4    3.518     22M   203K       0.0       0.0
 L44      4/0   844.39 MB   0.0      4.2     4.2      0.0       4.1      4.1       0.0   1.0    180.5    177.7     23.76             22.28         5    4.751     36M   505K       0.0       0.0
 L45     13/0    3.12 GB   0.0      7.5     6.5      1.0       7.2      6.2       0.0   1.1    191.0    184.1     40.19             38.22         5    8.038     65M  2281K       0.0       0.0
 L46     17/0    4.18 GB   0.0      8.3     7.1      1.2       7.9      6.6       0.0   1.1    188.3    178.1     45.16             43.06         4   11.289     73M  3845K       0.0       0.0
 L47     22/0    5.34 GB   0.0      8.9     7.5      1.4       8.2      6.8       0.0   1.1    182.5    168.1     49.86             46.32         3   16.619     78M  6099K       0.0       0.0
 L48     27/0    6.58 GB   0.0      9.2     7.6      1.6       8.2      6.6       0.0   1.1    181.7    161.0     52.13             49.39         2   26.066     81M  9215K       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    174/0   42.74 GB   0.0     42.3    37.0      5.3      62.4     57.1       0.0   2.8    116.0    171.0    373.46            329.14       135    2.766    372M    22M       0.0       0.0
 Int      0/0    0.00 KB   0.0     11.6     9.4      2.3      15.0     12.7       0.0   3.8    139.2    179.8     85.39             78.53        26    3.284    102M  5141K       0.0       0.0


setup:
./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456

benchmark:
./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1

cbi42 avatar Mar 25 '24 19:03 cbi42

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Mar 26 '24 15:03 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar May 23 '24 22:05 facebook-github-bot

Thanks for the review.

What about even smaller, like level0_file_num_compaction_trigger=1?

Good point, I added the benchmark result in PR summary. The WA is just slightly worst.

Actually you may also want to adjust the max_read_amp=0 calculation for the kCompactionStopStyleSimilarSize case, or disallow it for now.

Thanks for catching this. For kCompactionStopStyleSimilarSize, I updated the code to let max_read_amp=0 fallback to old behavior (use level0_file_num_compaction_trigger).

For something structurally different, maybe kCompactionStopStyleSimilarSize and min_merge_width=3

Besides min_merge_width=3, I set max_read_amp=9 and level0_file_num_compaction_trigger=3, the same benchmark gives WA 3.9.

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   3.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.1    142.04            110.53       108    1.315       0      0       0.0       0.0
 L40      0/0    0.00 KB   0.0      0.6     0.6      0.0       0.6      0.6       0.0   1.0    197.7    195.5      3.26              2.99         1    3.255   5519K    50K       0.0       0.0
 L41     18/0    4.35 GB   0.0     25.0     5.4     19.5      24.5      5.0       0.0   4.5    195.9    192.2    130.63            121.00        11   11.876    219M  4019K       0.0       0.0
 L42      3/0   636.30 MB   0.0      2.5     2.5      0.0       2.5      2.5       0.0   1.0    204.3    202.0     12.60             12.37         4    3.150     22M   202K       0.0       0.0
 L43      8/0    1.81 GB   0.0      4.4     3.8      0.6       4.3      3.7       0.0   1.1    206.4    202.6     21.72             21.29         5    4.345     38M   648K       0.0       0.0
 L44      8/0    1.81 GB   0.0      6.9     5.6      1.2       6.7      5.5       0.0   1.2    204.5    200.4     34.39             33.89         7    4.914     60M  1142K       0.0       0.0
 L45      5/0    1.03 GB   0.0      5.4     4.8      0.6       5.3      4.7       0.0   1.1    202.1    198.3     27.49             27.11         6    4.582     47M   816K       0.0       0.0
 L46     20/0    5.01 GB   0.0     11.7     8.6      3.1      11.1      8.1       0.0   1.3    206.8    196.9     57.85             56.96         7    8.264    102M  4811K       0.0       0.0
 L47      5/0    1.03 GB   0.0      4.2     3.5      0.6       4.1      3.5       0.0   1.2    204.6    200.3     20.86             20.67         4    5.216     36M   713K       0.0       0.0
 L48     20/0    5.01 GB   0.0      7.9     5.5      2.4       7.4      5.0       0.0   1.4    206.5    193.9     39.33             38.78         3   13.111     69M  4217K       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    178/0   43.38 GB   0.0     68.6    40.4     28.1      89.3     61.1       0.0   3.9    143.3    186.5    490.18            445.58       156    3.142    603M    16M       0.0       0.0

cbi42 avatar May 23 '24 22:05 cbi42

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar May 23 '24 22:05 facebook-github-bot

Thanks for the review.

What about even smaller, like level0_file_num_compaction_trigger=1?

Good point, I added the benchmark result in PR summary. The WA is just slightly worst.

Actually you may also want to adjust the max_read_amp=0 calculation for the kCompactionStopStyleSimilarSize case, or disallow it for now.

Thanks for catching this. For kCompactionStopStyleSimilarSize, I updated the code to let max_read_amp=0 fallback to old behavior (use level0_file_num_compaction_trigger).

For something structurally different, maybe kCompactionStopStyleSimilarSize and min_merge_width=3

Besides min_merge_width=3, I set max_read_amp=9 and level0_file_num_compaction_trigger=3, the same benchmark gives WA 3.9.

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   3.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.1    142.04            110.53       108    1.315       0      0       0.0       0.0
 L40      0/0    0.00 KB   0.0      0.6     0.6      0.0       0.6      0.6       0.0   1.0    197.7    195.5      3.26              2.99         1    3.255   5519K    50K       0.0       0.0
 L41     18/0    4.35 GB   0.0     25.0     5.4     19.5      24.5      5.0       0.0   4.5    195.9    192.2    130.63            121.00        11   11.876    219M  4019K       0.0       0.0
 L42      3/0   636.30 MB   0.0      2.5     2.5      0.0       2.5      2.5       0.0   1.0    204.3    202.0     12.60             12.37         4    3.150     22M   202K       0.0       0.0
 L43      8/0    1.81 GB   0.0      4.4     3.8      0.6       4.3      3.7       0.0   1.1    206.4    202.6     21.72             21.29         5    4.345     38M   648K       0.0       0.0
 L44      8/0    1.81 GB   0.0      6.9     5.6      1.2       6.7      5.5       0.0   1.2    204.5    200.4     34.39             33.89         7    4.914     60M  1142K       0.0       0.0
 L45      5/0    1.03 GB   0.0      5.4     4.8      0.6       5.3      4.7       0.0   1.1    202.1    198.3     27.49             27.11         6    4.582     47M   816K       0.0       0.0
 L46     20/0    5.01 GB   0.0     11.7     8.6      3.1      11.1      8.1       0.0   1.3    206.8    196.9     57.85             56.96         7    8.264    102M  4811K       0.0       0.0
 L47      5/0    1.03 GB   0.0      4.2     3.5      0.6       4.1      3.5       0.0   1.2    204.6    200.3     20.86             20.67         4    5.216     36M   713K       0.0       0.0
 L48     20/0    5.01 GB   0.0      7.9     5.5      2.4       7.4      5.0       0.0   1.4    206.5    193.9     39.33             38.78         3   13.111     69M  4217K       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    178/0   43.38 GB   0.0     68.6    40.4     28.1      89.3     61.1       0.0   3.9    143.3    186.5    490.18            445.58       156    3.142    603M    16M       0.0       0.0

I was worried the sizes would be chaotic like that. T189373775 could help as well as a generous value for size ratio. Anyways that's a separate problem.

ajkr avatar May 23 '24 23:05 ajkr

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar May 24 '24 00:05 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar May 24 '24 00:05 facebook-github-bot

@cbi42 merged this pull request in facebook/rocksdb@fecb10c2fa1501fd71a120793e1913a1ac7407ea.

facebook-github-bot avatar May 24 '24 17:05 facebook-github-bot