doris icon indicating copy to clipboard operation
doris copied to clipboard

[enhancement](compaction) introduce segment compaction (#12609)

Open freemandealer opened this issue 2 years ago • 0 comments

Implement segmentwise compaction during rowset write to reduce the number of segments produced by load jobs, otherwise may cause OLAP_ERR_TOO_MANY_SEGMENTS (-238).

Signed-off-by: freemandealer [email protected]

Proposed changes

Issue Number: close #12609

Problem summ

Intro

The default limit is 200 segment perf rowset. Too many segments may fail the whole load process (OLAP_ERR_TOO_MANY_SEGMENTS -238). If we increase the limit, the load will succeed but the pressure is transferred to the subsequential rowsetwise compaction. Things get worse when the user issue a query, e.g. insert into select stmt, right after load job but before rowsetwise compaction, he/she will suffer the performance disaster or maybe end up with OOM.

So we are introducing segmentwise compaction which will compact data DURING the write process, instead of waiting for rowsetwise compaction until txn has been committed.

Design

Tigger

Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.

Target Selection

We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.

Compaction Process

A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:

  • build a MergeIterator from the target segments
  • create a new segment writer
  • for each block readed from MergeIterator, the Writer append it

SegID handling

SegID must remain consecutive after segment compaction.

If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:

  • we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
  • delete seg_0, seg_1, seg_2 and seg_3
  • rename seg_0-3 to seg_0
  • rename seg_4 to seg_1

It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.

Test results

The amount of data can Doris load

First, we test the data amount that we can successfully load into doris disable/enable segment compaction.Tests are based on TPCH. Table is created as 1 bucket and no parallel. We trigger segment compaction every 10 segments produced by rowset writer.

cases data amount
Disable SegCompaciton 1.12 million rows, 18.67GB
Enable SegCompaction 11 million rows, 183GB

The result shows that the amount of data we can load to doris improve 10 times after enabling segment compaction. The ratio is correspond to the triggering segment number.

Impact on latency

When segment compaction is disabled, a load job will finish in 1260s during the test. And the sequential rowsetwise compaction cost 151s.

We give the test results when enabling segment compaction in different triggering segment number:

triggering segment number Load Latency RowsetCompaction Latency
5 (trigger every 5 segments) 1089s (-13%) 242s (+60%)
10 1053s (-16%) 166s (+9%)
20 960s (-23%) 172s (+13%)
40 1320s (+4%) 169s (+11%)

We load without segment compaction for serveral times and each gives us a different latency range from (-25%, +25%). So we believe that segment compaction has little impact on the latency.

In addition to the above costs, we wait inflight segment compaction tasks to finish before building rowset meta and publishing the data. The length of the wait time depends on when the build takes the place but there is a theoretical range for it and the range is related to the time each segment compaction task will cost:

triggering segment number Single SegCompaction Task Latency
5 5s
10 9s
20 20s
40 60s

Impact on memory usage

Compaction itself will consume memory. The following test results show the memory footprint when enabling segment compaction.

When segment compaction disabled, a load job will use 4.83% of 128GB memory. And the sequential rowsetwise compaction takes 8.21%.

When enable segment compaction:

triggering segment number Load Memory Usage RowsetCompaction Memory Usage
5 (trigger every 5 segments) Avg.:6.62% Peak:7.01% 6.9% (-16%)
10 Avg.:7.63% Peak:9.61% 6.56% (-20%)
20 Avg.:7.6% Peak:9.8% 6.9% (-16%)
40 Avg.:5.09% Peak:9% 6.62% (-19%)

Segment compaction uses more memory because we add another segment writer to write compacted data and multiple segment readers to read source data for each active rowset.

However, since data are more ordered and the number of segment is decreased after segment compaction, RowsetCompaction uses less memory.

Checklist(Required)

  1. Does it affect the original behavior:
    • [X] Yes
    • [ ] No
    • [ ] I don't know
  2. Has unit tests been added:
    • [X] Yes
    • [ ] No
    • [ ] No Need
  3. Has document been added or modified:
    • [ ] Yes
    • [ ] No
    • [X] No Need
  4. Does it need to update dependencies:
    • [ ] Yes
    • [X] No
  5. Are there any changes that cannot be rolled back:
    • [ ] Yes (If Yes, please explain WHY)
    • [X] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

freemandealer avatar Sep 22 '22 09:09 freemandealer