taichi icon indicating copy to clipboard operation
taichi copied to clipboard

Taichi CMake Overhaul

Open qiao-bo opened this issue 2 years ago • 16 comments

We would like to share our proposal for modernizing Taichi's CMake-based build system. By embracing the target-based approach, we can enforce a good modular design in our code base. This brings us benefits such as reduced (re-)compilation time. By being explicit and untangling the dependencies in current code base, code (target) that do not depend on others can be isolated and built independently (target level parallelism). Changes on one target would not incur the rebuild of others.

This is ongoing work. Previously some efforts have been spent on cleaning up the code base:

  • #2196
  • #2203
  • #2195

Related issues

  • #4882

Proposal

The proposed changes mainly consist of two parts: First, we would like to maintain a list of targets that Taichi uses and being built upon. Each of these targets should have its own CMakeList.txt that specifies the build requirements as well as the usage requirements of this target. This may require some shuffling in our code base. Second, replacing many of our current CMake functions that have a global scope with target-based APIs. For example, include_directories should be replaced with target_include_directories to reduce hidden dependencies on header files to its minimum. Here, we share the outcome of some preliminary discussions:

Targets

Current Taichi C++ code base (https://github.com/taichi-dev/taichi/tree/master/taichi) could be split into the following build targets (names TBD):

  1. program: Core module that calls other targets in order. (https://github.com/taichi-dev/taichi/tree/master/taichi/program)
  2. ir: Include ir, analysis, transform. This may depend on type, snode etc.
  3. codegen: Current location is (https://github.com/taichi-dev/taichi/tree/master/taichi/backends), We can further divide into LlvmCodegen and SpirvCodegen. codegen should depend on runtime.
  4. runtime: At the moment shares the same location as codegen. Need code shuffling to split out.
  5. artifact: This information is required by both codegen and runtime. One example in the context of AOT is KernelAttributes: https://github.com/taichi-dev/taichi/blob/5b890f90f9edb90c2f1b4841ea7142f4b13e0bb2/taichi/backends/metal/kernel_utils.h#L105-L156 As a first step, we can move this information from codegen to runtime such that runtime does not depend on codegen.
  6. common: Include logging, macro and such things. This target is shared among all targets.
  7. type: Many targets may depend on this.
  8. snode: Ideally we want one snode target that others can depend on. Currently, code are spread in ir, struct etc. We can have an snode builder such that an implemented snode no longer contains its constructor information.
  9. gui: Taichi_core's peripheral targets.
  10. python: Pybind related code.
  11. system: Includes platform

This dependency graph provides an overview of the mentioned targets: Dependency graph

API Changes

  1. Minimize the usage of global variables. Being explicit using targets. For example, include_directories -> target_include_directories, link_directories -> target_link_directories, etc. Targets in sub-directories can be added by add_subdirectory().
  2. Differentiate between build and usage requirement of targets, use private for build requirement and interface or public for usage requirements. For example, -Wall is a build requirement not a usage requirement.
  3. Replace these file glob APIs with explicit target_source function. https://github.com/taichi-dev/taichi/blob/5b890f90f9edb90c2f1b4841ea7142f4b13e0bb2/cmake/TaichiCore.cmake#L89-L103 File glob is not recommended in modern CMake. More importantly, avoid using TI_WITH_XX to guard the inclusion of source files. This leads to the pollution of many TI_WITH_XX in Taichi core target. Such as https://github.com/taichi-dev/taichi/blob/91d6f60f25abdf92fd1ec1d0de719bdcf9b82b6a/taichi/program/program.cpp#L78-L112

Implementation Roadmap

Phase one we would like to divide the current core target, namely taichi_isolated_core, into a few major build targets including program, codegen, runtime, ir, python. https://github.com/taichi-dev/taichi/blob/91d6f60f25abdf92fd1ec1d0de719bdcf9b82b6a/cmake/TaichiCore.cmake#L244-L245

  • [x] Split runtime from the backends dir. (Define runtime targets)
    • [x] cc
    • [x] cpu
    • [x] cuda
    • [x] dx
    • [x] interop
    • [x] metal
    • [x] opengl
    • [x] vulkan
    • [x] wasm
  • [x] Split codegen from the backends dir. (Define codegen targets)
  • [ ] Define ir target. Resolve the dependencies on program, backends etc.
  • [x] Isolate Pybind related code.

Phase two we can further split the shared and peripheral targets from e.g. program. This also includes artifact and snode. In addition, we can move header files to a separate taichi/include directory. This allows us to distribute libraries with headers. Tasks here will be continuously updated.

qiao-bo avatar Apr 21 '22 06:04 qiao-bo

Cool! Could you also provide a dependency graph. https://excalidraw.com/ could be your friend (@ailzhang recommended)

k-ye avatar Apr 25 '22 13:04 k-ye

Nice graph! IIUC, type and snode are sub-components under ir? Also artifact is probably too generic, better come up with a better naming for it... (Context: this refers to the artifacts/outcome from codegen)

k-ye avatar Apr 25 '22 14:04 k-ye

IIUC, type and snode are sub-components under ir?

At the moment yes. Nevertheless, type should not be an ir-specific component. Other targets such as codegen can also depends on type (and it does). Same for snode. Ideally we want to have a type as well as an snode target.

Also artifact is probably too generic, better come up with a better naming for it... (Context: this refers to the artifacts/outcome from codegen)

agree, any suggestions?

qiao-bo avatar Apr 26 '22 03:04 qiao-bo

Currently in taichi/backends folder, apart from the code for runtime and codegen, we also have code for our unified device API, maybe we can make a backend target for this part? Thus the dependencies would be something like:

runtime ->(depends on) backend codegen -> backend codegen -> runtime program -> backend

WDYT? @k-ye @ailzhang

qiao-bo avatar Apr 29 '22 09:04 qiao-bo

Would be great to separate out the unified device API! I'd call it rhi , though. @bobcao3

k-ye avatar Apr 29 '22 10:04 k-ye

Discussion: We should later distinguish between public headers vs private headers (currently we don't). I think the recommended way is: For public headers, we always include by its absolute path (which means relative path from Taichi project's root). For private headers, we can go with relative path from it's target's include folder. As long as it's guaranteed by a unique path per header file.

qiao-bo avatar Apr 29 '22 10:04 qiao-bo

Discussion: We should later distinguish between public headers vs private headers (currently we don't). I think the recommended way is: For public headers, we always include by its absolute path (which means relative path from Taichi project's root). For private headers, we can go with relative path from it's target's include folder. As long as it's guaranteed by a unique path per header file.

Agree. It's common that AOT glue codes mistakenly refer to unexported functions leading to linking problems.

PENGUINLIONG avatar May 17 '22 06:05 PENGUINLIONG

Note that right now also everything is grouped in a two-level namespace: ::taichi::lang. We should also consider just using taichi, and only the components that are truly language-related goes to taichi::lang, e.g. CHI IR. For backend stuff, it could be something like taichi::codegen, taichi::runtime, etc.

k-ye avatar May 20 '22 05:05 k-ye

Update:

  • API Changes implemented as proposed. Include file globs, directories APIs, scope specifier etc.
  • Newly defined targets include rhi, codegen, program_impls, runtime. The current dependency relationship is:

流程图

Next steps:

  • [x] Clean up header dependencies among rhi, codegen, and runtime.
  • [x] Define common utilities targets to break from core source files.
  • [x] Isolate Pybind source files
  • [ ] Split IR from core files

qiao-bo avatar Jul 04 '22 09:07 qiao-bo

Current dependency graph taichi

qiao-bo avatar Jul 14 '22 02:07 qiao-bo

Ideally TaichiCore.cmake should only deal with taichi_core related build, which means decoupled from language frontend such as Python and others. @AmesingFlank What is the status of TI_EMSCRIPTENED. WDYT if we rename this to taichi_javascript? and maybe move this part into a separate TaichiJavascript.cmake?

https://github.com/taichi-dev/taichi/blob/4bc6f0cb4f6d8078ebc6badb29d98da6617d4436/cmake/TaichiCore.cmake#L530

qiao-bo avatar Jul 19 '22 09:07 qiao-bo

@qiao-bo For taichi.js, I have decided to move away from using emscripten to compile taichi, because of binary size and performance issues. Instead, I have re-implemented the functionalities that I require in Javascript. So I think for now it would be a good idea to remove all TI_EMSCRIPTENED related code from the C++ codebase.

AmesingFlank avatar Jul 19 '22 09:07 AmesingFlank

@qiao-bo For taichi.js, I have decided to move away from using emscripten to compile taichi, because of binary size and performance issues. Instead, I have re-implemented the functionalities that I require in Javascript. So I think for now it would be a good idea to remove all TI_EMSCRIPTENED related code from the C++ codebase.

OK, that makes it cleaner, thanks for the info.

qiao-bo avatar Jul 19 '22 10:07 qiao-bo

Update: Current dependency graph among targets extracted from CMake. 流程图

qiao-bo avatar Jul 26 '22 07:07 qiao-bo

Update together with @ailzhang: In addition to taichi_core (which can be further split into ir, analysis, etc.), we can split an artifact target from current runtime targets. This information is required by both codegen and runtime. One example in the context of AOT is KernelAttributes: https://github.com/taichi-dev/taichi/blob/5b890f90f9edb90c2f1b4841ea7142f4b13e0bb2/taichi/backends/metal/kernel_utils.h#L105-L156

  • [x] move this information from codegen to runtime such that runtime does not depend on codegen.
  • [ ] split artifact target!

qiao-bo avatar Aug 18 '22 03:08 qiao-bo

FYI @PGZXB #5889 introduced an extra dependency from gfx_runtime to spirv_codegen which is unexpected. You can reproduce this by TAICHI_CMAKE_ARGS="-DTI_WITH_VULKAN:BOOL=ON -DTI_WITH_OPENGL:BOOL=OFF " python setup.py develop and then import taichi. Specifically it was introduced by https://github.com/taichi-dev/taichi/pull/5889/files#diff-f99a0ddc44f29052309b4ae1983406e7c442e8f44b683937edd532a1e70f2269R3. It can be worked arounded by linking spirv_codegen to gfx_runtime target but that contradicts with what we want. Ideally if cache_manager has to depend on both gfx_runtime and spirv_codegen it can be a separate target. Wdyt?

ailzhang avatar Aug 31 '22 01:08 ailzhang