doris icon indicating copy to clipboard operation
doris copied to clipboard

[Optimize](Random distribution) Improve the performance of tablet sink and delta writer of writing blocks

Open eldenmoon opened this issue 1 year ago • 0 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Description

The current distribution model for Doris is as follows:

OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.

This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.

This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet Whats more, is that we could even write to the local delta writer from OlapTableSink in the single_replica_load mode

Solution

No response

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

eldenmoon avatar Mar 03 '23 07:03 eldenmoon