Pare down the build time and binary size by opting in to just the passes we need
The MLIR build time has a lot of stuff for code gen for various architectures that we don't target (e.g., SPIRV) and passes and dialects that we don't use (e.g., gpu, arm*, etc.). We could likely save a lot of build time by opting-in to including just the dialects and passes we need in heir-opt, and that would also likely cut down the size of the final built binary.
My system:
12th Gen Intel(R) Core(TM) i9-12900K 3.20 GHz, 64.0 GB RAM (24 cores) Ubuntu clang version 17.0.6 (++20231209124227+6009708b4367-1~exp1~20231209124336.77)
From a clean build (bazel clean --expunge), bazel-build ...:all takes 27m 21s
Note taking as I go:
There are some dependencies that seem unnecessary, which I might be able to clean up upstream. For example,
bazel query 'somepath(//tools:heir-opt, @llvm-project//llvm:include/llvm/BinaryFormat/ELFRelocs/Lanai.def)'
//tools:heir-opt
@llvm-project//mlir:LLVMDialect
@llvm-project//llvm:BinaryFormat
@llvm-project//llvm:include/llvm/BinaryFormat/ELFRelocs/Lanai.def
LLVM dialect pulls in basically all of LLVM, even though it doesn't seem reasonable to run LLVM analyses themselves on the LLVM MLIR dialect.
bazel query 'somepath(//tools:heir-opt, @llvm-project//mlir:GPUDialect)'
//tools:heir-opt
@llvm-project//mlir:LinalgTransforms
@llvm-project//mlir:GPUDialect
Lots of upstream dialect dependencies get pulled in through LinalgTransforms, including GPU and Mesh and SparseTensor, while things like NVGPU get pulled in through MemRefTransforms. Might be able to separate them into smaller build targets to avoid.
After https://github.com/google/heir/pull/895, the build time does not meaningfully change.
One thing I'm trying to do, but haven't quite figured out how to do, is to run a profile of the build to get some sort of idea of what takes up most of the build time.
Trying again, but more specific (requires bazel version 7.0+):
bazel clean --expunge
bazel build --profile build.profile //tools:heir-opt
bazel analyze-profile build.profile
Before
=== PHASE SUMMARY INFORMATION ===
Total launch phase time 0.694 s 0.10%
Total init phase time 79.458 s 11.62%
Total target pattern evaluation phase time 0.027 s 0.00%
Total interleaved loading-and-analysis phase time 7.000 s 1.02%
Total preparation phase time 0.014 s 0.00%
Total execution phase time 596.347 s 87.24%
Total finish phase time 0.027 s 0.00%
---------------------------------------------------------------------
Total run time 683.570 s 100.00%
Critical path (105.226 s):
Time Percentage Description
32.7 ms 0.03% action 'Executing genrule @llvm-project//llvm generate_static_extension_registry [for tool]'
0.06 ms 0.00% cc_library-compile for @llvm-project//llvm Support
0.12 ms 0.00% cc_library-compile for @llvm-project//llvm TableGen
6.111 s 5.81% action 'Compiling llvm/lib/TableGen/Record.cpp [for tool]'
84.5 ms 0.08% action 'Linking external/llvm-project/llvm/llvm-min-tblgen [for tool]'
0.51 ms 0.00% runfiles for @llvm-project//llvm llvm-min-tblgen
2.028 s 1.93% action 'Generating code from table lib/Target/RISCV/RISCV.td @llvm-project//llvm RISCVTargetParserDefGen__gen_riscv_target_def_genrule [for tool]'
0.05 ms 0.00% cc_library-compile for @llvm-project//llvm TargetParser
0.05 ms 0.00% cc_library-compile for @llvm-project//mlir TableGen
0.01 ms 0.00% cc_library-compile for @llvm-project//mlir MlirTableGenMain
0.08 ms 0.00% BazelCppSemantics_build_arch_k8-opt-exec-2B5CBBC6 for @llvm-project//mlir mlir-tblgen
16.928 s 16.09% action 'Compiling mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp [for tool]'
99.1 ms 0.09% action 'Linking external/llvm-project/mlir/mlir-tblgen [for tool]'
0.13 ms 0.00% runfiles for @llvm-project//mlir mlir-tblgen
75.2 ms 0.07% action 'TdGenerate external/llvm-project/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.cpp.inc'
0.03 ms 0.00% cc_library-compile for @llvm-project//mlir LLVMIntrinsicOpsIncGen
0.05 ms 0.00% cc_library-compile for @llvm-project//mlir LLVMDialect
74.724 s 71.01% action 'Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp'
5.143 s 4.89% action 'Linking tools/heir-opt'
0.39 ms 0.00% runfiles for //tools heir-opt
After
=== PHASE SUMMARY INFORMATION ===
Total launch phase time 0.785 s 0.19%
Total init phase time 89.419 s 21.97%
Total target pattern evaluation phase time 0.050 s 0.01%
Total interleaved loading, analysis and execution phase time 316.673 s 77.81%
Total finish phase time 0.032 s 0.01%
---------------------------------------------------------------------
Total run time 406.960 s 100.00%
Critical path (83.635 s):
Time Percentage Description
1.80 ms 0.00% action 'Writing script external/llvm-project/mlir/MlirTableGenMain.cppmap [for tool]'
16.740 s 20.02% action 'Compiling mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp [for tool]'
138 ms 0.17% action 'Linking external/llvm-project/mlir/mlir-tblgen [for tool]'
0.12 ms 0.00% runfiles for @@llvm-project//mlir mlir-tblgen
219 ms 0.26% action 'TdGenerate external/llvm-project/mlir/include/mlir/Interfaces/MemorySlotOpInterfaces.h.inc'
64.244 s 76.81% action 'Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp'
2.291 s 2.74% action 'Linking tools/heir-opt'
0.36 ms 0.00% runfiles for //tools heir-opt
So it looks like while the critical path was not significantly reduced, there is less work being done in total (683s vs 406s), so I'm not doing nothing! There are probably other targets that need to be pruned.
For bazel build --profile build_test.profile tests/...:all
The critical path is only like 15% of the total runtime... gotta see what else is being built.
=== PHASE SUMMARY INFORMATION ===
Total launch phase time 0.045 s 0.00%
Total init phase time 0.130 s 0.01%
Total target pattern evaluation phase time 0.005 s 0.00%
Total interleaved loading-and-analysis phase time 7.920 s 0.77%
Total preparation phase time 0.092 s 0.01%
Total execution phase time 1019.486 s 99.20%
Total finish phase time 0.031 s 0.00%
---------------------------------------------------------------------
Total run time 1027.711 s 100.00%
Critical path (141.435 s):
Time Percentage Description
0.32 ms 0.00% action 'Writing script external/llvm_zstd/zstd.cppmap [for tool]'
22.6 ms 0.02% action 'Compiling llvm/lib/Support/SlowDynamicAPInt.cpp [for tool]'
0.60 ms 0.00% action 'Linking external/llvm-project/llvm/llvm-min-tblgen [for tool]'
0.10 ms 0.00% runfiles for @llvm-project//llvm llvm-min-tblgen
1.123 s 0.79% action 'Generating code from table include/llvm/IR/Intrinsics.td @llvm-project//llvm intrinsic_S390_gen__gen_intrinsic_enums__intrinsic_prefix_s390_genrule [for tool]'
0.01 ms 0.00% cc_library-compile for @llvm-project//llvm Core
0.01 ms 0.00% cc_library-compile for @llvm-project//llvm BitReader
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm IRReader
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm Object
0.01 ms 0.00% cc_library-compile for @llvm-project//llvm DebugInfo
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm DebugInfoBTF
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm DebugInfoPDB
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm Symbolize
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm ProfileData
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm Analysis
0.00 ms 0.00% cc_library-compile for @llvm-project//llvm BitWriter
0.01 ms 0.00% cc_library-compile for @llvm-project//mlir LLVMDialect
128.406 s 90.79% action 'Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp [for tool]'
1.594 s 1.13% action 'Linking tools/heir-opt [for tool]'
0.31 ms 0.00% runfiles for //tools heir-opt
7.745 s 5.48% action 'Action tests/openfhe/end_to_end/box_blur_64x64_test_heir_opt.mlir'
5.54 ms 0.00% action 'Action tests/openfhe/end_to_end/box_blur_64x64_lib.h'
0.03 ms 0.00% cc_library-compile for //tests/openfhe/end_to_end box_blur_64x64_test_cc_lib
2.437 s 1.72% action 'Compiling tests/openfhe/end_to_end/box_blur_64x64_test_lib.inc.cc'
48.2 ms 0.03% action 'Linking tests/openfhe/end_to_end/libbox_blur_64x64_test_cc_lib.so'
0.28 ms 0.00% action 'SolibSymlink _solib_k8/libtests_Sopenfhe_Send_Uto_Uend_Slibbox_Ublur_U64x64_Utest_Ucc_Ulib.so'
52.3 ms 0.04% action 'Linking tests/openfhe/end_to_end/box_blur_64x64_test'
0.06 ms 0.00% runfiles for //tests/openfhe/end_to_end box_blur_64x64_test
It looks like the main extra things built in the test invocation are test-critical things: openfhe and its dependencies, googletest, absl, rapidjson, and the heir-translate emitters.
Ah, here we go! We have a dependency of mlir-opt in our tests!
bazel query 'somepath(//tests:all, @llvm-project//llvm:AArch64CodeGen)'
//tests:test_utilities
@llvm-project//mlir:mlir-opt
@llvm-project//llvm:AllTargetsCodeGens
@llvm-project//llvm:AArch64CodeGen
This is incurring a huge bloat from building all codegen targets, etc., when running tests (but not heir-opt).
We do use these, but we shouldn't anymore
$ rg mlir-opt tests
tests/memref_global_raw.mlir
10:// RUN: mlir-opt %s -pass-pipeline="builtin.module( \
27:// RUN: mlir-opt -pass-pipeline="builtin.module( \
tests/memref_global.mlir
8:// RUN: mlir-opt %s -pass-pipeline="builtin.module( \
25:// RUN: mlir-opt -pass-pipeline="builtin.module( \
tests/BUILD
25: "@llvm-project//mlir:mlir-opt",
I was able to remove mlir-opt, which helped (in particular, removing all the code in mlir/test from the dependencies), but it seems the next hurdle is that a lot of LLVM code is pulled in due to our use of mlir-cpu-runner, which is a bit more widespread.
As of now, https://github.com/google/heir/pull/895 has a wallclock time of 25 minutes, which is... an improvement I guess.
Without removing the use of mlir-cpu-runner, I don't think we'll be able to do much more here. Closing.
Re-opening this and adding "reducing build (disk) size" as a goal, as that's the sticking point for devcontainer.json/ default GH codespaces (see #1001).
Building HEIR+LLVM using bazel requires around 25GB of disk space (that's probably in DEBUG mode) (See https://github.com/google/heir/pull/1001#issuecomment-2394912055)
Looking through the bazel llvm setup, I noticed this comment. @j2kun any update on removing the NVPTX target?
Also, I think it'd make sense to replace X86 and AArch64 with the special Native target.
https://github.com/google/heir/blob/94fae4f9a0ea1bc73319cf5d5eb8008fb02b97d4/bazel/setup_llvm.bzl#L8-L17
Also, where do we specify which parts of LLVM we want?
I.e., what's the bazel equivalent of -DLLVM_ENABLE_PROJECTS=mlir?
Bazel just defines build targets and figures out all the dependencies you need when you attempt to build a target. I think one of the issues here is that some of our targets (e.g., the test targets that depend on mlir-cpu-runner) pull in a lot of upstream LLVM that is otherwise unused by HEIR. This can caused by upstream bazel build target definitions pulling in more dependencies than they strictly need, though when I looked into it I don't think they do that, outside of the dependency of some tests on mlir-cpu-runner.
When I change a source file I can observe that file being compiled twice like
$ bazel test //tests/...
INFO: Analyzed 813 targets (0 packages loaded, 0 targets configured).
[2 / 543] 1 / 360 tests; 2 actions running
Compiling lib/Target/Lattigo/LattigoEmitter.cpp; 0s linux-sandbox
Compiling lib/Target/Lattigo/LattigoEmitter.cpp [for tool]; 0s linux-sandbox
My wild guess is the first one is the dev build while the second one is the -c opt build. My past experience is that, for box_blur_64x64, it needs 160s to compile for the first one in tests/Transforms/heir_vectorizer but it only needs 30s to compile in tests/Example/openfhe.
For the former, it introduces dependence to heir-opt by //tests/test_utilities while the second one uses load("@heir//tools:heir-opt.bzl", "heir_opt") from tool (that might be the source of [for tool].
The compiling twice feature incurs additional test latency as I have to build MLIR twice when llvm commit changes. I am not so annoyed by this as my dev machine is powerful, but just want to ask if this is intended, or such situation is the best we can do now.
If we switch to -c opt build for all test, the critical path can be much shorter as the critical path is often tests/Transforms/heir_vectorizer/box_blur.
If we switch to
-c optbuild for all test, the critical path can be much shorter as the critical path is oftentests/Transforms/heir_vectorizer/box_blur.
Definitely we should deduplicate if possible. Though I don't think the problem is that one is a dev build and another is -c opt. All our blaze invocations should run with -c dbg, since that is what is set in the .bazelrc
But also switching to -c opt will make compilation time of MLIR slower (-c opt runs more optimizations) while making the execution of heir-opt (when compiling, e.g., box_blur) faster, while using -c fastbuild will make the compilation of heir-opt faster but the execution slower.
Testing at 651047045d870879cf12d5fa4827af4f2dc6c322 (including fresh download of all dependencies)
# cold start
bazel clean --expunge
time bazel test -c opt //...:all
43:43.44 total
# incremental build
# make a change to lib/Target/Lattigo/LattigoEmitter.cpp that triggers a meaningful build
time bazel test -c opt //...:all
38.371 total
bazel clean --expunge
time bazel test -c fastbuild //...:all
31:31.72 total
# incremental build
# make a change to lib/Target/Lattigo/LattigoEmitter.cpp that triggers a meaningful build
time bazel test -c opt //...:all
1:49.81 total
(1/2)
After discussing with the bazel maintainers internally, the issue is that when you build a target that is used in another bazel rule (e.g., building a compiler to compile some code) then it is recommended to set a cfg="exec". I still don't fully understand why this is separated, but it incurs a second build. The advice I was given was "if you don't set cfg="exec" then cross-compilation won't work" but that seems irrelevant for developer builds.
When I rebuild it appears the number of total targets built is cut from ~19k to ~12k.
bazel clean --expunge
time bazel test -c fastbuild //...:all
18:02.95 total
# incremental build
# make a change to lib/Target/Lattigo/LattigoEmitter.cpp that triggers a meaningful build
time bazel test -c opt //...:all
1:49.13 total
The incremental build time here is still dominated by the actual tests running (1.5 minutes for box_blur instead of 30s), but the cold start build time is reduced by 42%.
For -c opt it was 27:03.26 total for cold start, an improvement of 38%
I think https://github.com/google/heir/issues/1504 is the remainder of work for this issue, so closing.