doris
doris copied to clipboard
[chore](thirdparty) Support third-party incremental build
It's a pain that third-party cannot be built partially and incrementally. This commit adds several build options to kill the pain.
to build specified dependencies
./build-third-party.sh -d "curl lz4 jemalloc"
automatic incremental build, a version file version.txt
is added to complete this functionality. Check version.txt for more details.
./build-third-party.sh
to force rebuild all dependencies
./build-third-party.sh -a
Also, to enable automatic third-party incremental build support in build.sh
, set the env. var.
export ENABLE_INCREMENTAL_THIRD_PARTY_BUILD=1
This may be enabled by default in the future.
Hi @gavinchou , thanks for your contribution and the idea is great! However, I don't think it is convenient for the developer to update the version.txt
after he modifies the third-parties. He probably forget to modify it.
I think we could record the MD5 checksums of each packages and the build scripts (e.g. build-thirdparty.sh
) in TP_INSTALL_DIR
and calculate the diffs before we build the project.
Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?
And, there is another case, this PR resolves, that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the source file tarball?
Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?
We already have the MD5 list in thirdparty/vars.sh. We update this file when we want to update the third-parties. Therefore, we can write the MD5 to a file at the last place of each build_xxx
function.
And, there is another case that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the
build-thirdparty.sh
?
This problem is inevitable in both ways (either MD5 way or version counter way) if we want to support incremental installing. We should sort out the dependencies tree in our build script first. The reason is that it is hard for a developer to find out the dependencies when he want to upgrade a specific package only.
Hi, @adonis0147 Thanks for your feedback, using MD5 for the incremental build is a generic idea, however, there is another problem to resolve -- how to manage the MD5 list? It seems that we still need to update the MD5 list manually, can you point out how it works in detail?
We already have the MD5 list in thirdparty/vars.sh. We update this file when we want to update the third-parties. Therefore, we can write the MD5 to a file at the last place of each
build_xxx
function.And, there is another case that sometimes Doris developers have to build specific third-parties in a specific order when some dependencies are updated and they require specific build order (one may rely on another, e.g. brpc relies on protubuf), it seems hard to resolve this problem by updating nothing but the
build-thirdparty.sh
?This problem is inevitable in both ways (either MD5 way or version counter way) if we want to support incremental installing. We should sort out the dependencies tree in our build script first. The reason is that it is hard for a developer to find out the dependencies when he want to upgrade a specific package only.
@adonis0147
Thanks for the information.
I might not clearly declare that this PR solves the second problem (partial build with specific order) I mentioned in my previous comment, which I haven't figured out how to do with the existing MD5 list, it may take a long time to find a perfect solution, things may suddenly get too complicated. The proposed solution is evolvable.
Is it OK with you that we support the MD5 thing in the future, just as I commented in the resolved conversation?
I suggest making it another individual PR to improve it, downloading is actually not a problem, for now, the build is. And updating version.txt is the obligation of whoever makes changes to the third-party, isn't it?
@adonis0147 AFAIK, the worst case of this proposal is that some one forgets to update version.txt, which is no worse than the current situation -- we all have to rebuild all the third-parties.
@gavinchou If we don't resolve the issues with dependency tree, the version counter way can be optimized by MD5 way. The differences can be figured out automatically and we don't need to maintain the extra file.
The workflow we update the third parties is described as following:
-
In
thirdparty/vars.sh
, we modify the corresponding information of some packages. TakingBRPC
as an example, we update the following information:- BRPC_DOWNLOAD
- BRPC_NAME
- BRPC_SOURCE
- BRPC_MD5SUM
-
After that, we run
build-thirdparty.sh
and the updated version of these packages will be installed.
The workaround to figure out the differences:
- At the last place of each
build_xxx
function, we write the corresponding MD5 to a specific file (e.g.INSTALLED_VERSION
) - Before we build the third parties, we calculate the different MD5 checksums by comparing the MD5 in the file (e.g.
INSTALLED_VERSION
) and the one in the newestthirdparty/vars.sh
. - We can build the packages which are found in step 2 incrementally.