doris icon indicating copy to clipboard operation
doris copied to clipboard

[chore](thirdparty) Support third-party incremental build

Open gavinchou opened this issue 1 year ago • 3 comments

It's a pain that third-party cannot be built partially and incrementally. This commit adds several build options to kill the pain.

to build specified dependencies

./build-third-party.sh -d "curl lz4 jemalloc"

automatic incremental build, a version file version.txt is added to complete this functionality. Check version.txt for more details.

./build-third-party.sh

to force rebuild all dependencies

./build-third-party.sh -a

Also, to enable automatic third-party incremental build support in build.sh, set the env. var.

export ENABLE_INCREMENTAL_THIRD_PARTY_BUILD=1

This may be enabled by default in the future.

gavinchou avatar Sep 17 '22 08:09 gavinchou

Hi @gavinchou , thanks for your contribution and the idea is great! However, I don't think it is convenient for the developer to update the version.txt after he modifies the third-parties. He probably forget to modify it.

I think we could record the MD5 checksums of each packages and the build scripts (e.g. build-thirdparty.sh) in TP_INSTALL_DIR and calculate the diffs before we build the project.

adonis0147 avatar Sep 21 '22 02:09 adonis0147

Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?

And, there is another case, this PR resolves, that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the source file tarball?

gavinchou avatar Sep 22 '22 06:09 gavinchou

Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?

We already have the MD5 list in thirdparty/vars.sh. We update this file when we want to update the third-parties. Therefore, we can write the MD5 to a file at the last place of each build_xxx function.

And, there is another case that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the build-thirdparty.sh?

This problem is inevitable in both ways (either MD5 way or version counter way) if we want to support incremental installing. We should sort out the dependencies tree in our build script first. The reason is that it is hard for a developer to find out the dependencies when he want to upgrade a specific package only.

adonis0147 avatar Sep 22 '22 07:09 adonis0147

Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?

We already have the MD5 list in thirdparty/vars.sh. We update this file when we want to update the third-parties. Therefore, we can write the MD5 to a file at the last place of each build_xxx function.

And, there is another case that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the build-thirdparty.sh?

This problem is inevitable in both ways (either MD5 way or version counter way) if we want to support incremental installing. We should sort out the dependencies tree in our build script first. The reason is that it is hard for a developer to find out the dependencies when he want to upgrade a specific package only.

@adonis0147
Thanks for the information. I might not clearly declare that this PR solves the second problem (partial build with specific order) I mentioned in my previous comment, which I haven't figured out how to do with the existing MD5 list, it may take a long time to find a perfect solution, things may suddenly get too complicated. The proposed solution is evolvable. Is it OK with you that we support the MD5 thing in the future, just as I commented in the resolved conversation?

I suggest making it another individual PR to improve it, downloading is actually not a problem, for now, the build is. And updating version.txt is the obligation of whoever makes changes to the third-party, isn't it?

gavinchou avatar Sep 24 '22 07:09 gavinchou

@adonis0147 AFAIK, the worst case of this proposal is that some one forgets to update version.txt, which is no worse than the current situation -- we all have to rebuild all the third-parties.

gavinchou avatar Sep 24 '22 07:09 gavinchou

@gavinchou If we don't resolve the issues with dependency tree, the version counter way can be optimized by MD5 way. The differences can be figured out automatically and we don't need to maintain the extra file.

adonis0147 avatar Sep 24 '22 09:09 adonis0147

The workflow we update the third parties is described as following:

  1. In thirdparty/vars.sh, we modify the corresponding information of some packages. Taking BRPC as an example, we update the following information:

    1. BRPC_DOWNLOAD
    2. BRPC_NAME
    3. BRPC_SOURCE
    4. BRPC_MD5SUM
  2. After that, we run build-thirdparty.sh and the updated version of these packages will be installed.

The workaround to figure out the differences:

  1. At the last place of each build_xxx function, we write the corresponding MD5 to a specific file (e.g. INSTALLED_VERSION)
  2. Before we build the third parties, we calculate the different MD5 checksums by comparing the MD5 in the file (e.g. INSTALLED_VERSION) and the one in the newest thirdparty/vars.sh.
  3. We can build the packages which are found in step 2 incrementally.

adonis0147 avatar Sep 24 '22 09:09 adonis0147