doris icon indicating copy to clipboard operation
doris copied to clipboard

[Feature] New BulkLoad, which support build segment file isolate from Doris cluster

Open compasses opened this issue 3 years ago • 0 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Description

Currently, there are several ways to load data into Doris, like broker load, stream load etc. But all these ways have some kind of shortcomings from our perspective, for example:

  1. High resource cost, because each tablet has multiple replica, and each replica will do the data load separately.
  2. Hard to ensure cluster stable and performance, load jobs may lead to high load of resource competition.
  3. Cause query latency climbing up and down, this kind of issue may came across again and again.

So we may want a new way to do data load, like lightweight read / write splitting, which can extremely keep high-throughput write and read.

Here we just have a very rough design, and many details need clarify.

The overall flow:

image

The new bulk load may have some connect with function like backup / restore, broker load etc.

  1. FE issue the bulk load command, and BE will write tablet meta to HDFS.
  2. Then FE will schedule a spark / flink job to run segment builder, which will read HDFS data file and build segment file to local, and upload these segment file to HDFS when build finish.
  3. Then FE will start to load these segment from HDFS, mainly each BE do the real job.
  4. Last the FE need publish this transaction like the broker load.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

compasses avatar Aug 10 '22 06:08 compasses