Modular Code Generator: Design Document

Open tbennun opened this issue 6 months ago • 1 comments

We are interested in refactoring code generation to become a series of passes.

Code generation is already built as a series of passes, but is a complex monolithic subpackage of DaCe. The goal is to turn the final code generation into a simpler traversal process, so that it is more modular, extensible, and verifiable.

The current code generation passes (in the monolithic structure) are:

Special validation passes before code generation
Metadata collection (free symbols, sub-SDFG argument lists, etc.)
Allocation scope determination (i.e., where a data container's memory will actually be allocated/deallocated based on lifetime and scope rules).
Creation of the State struct for the SDFG program
Copy-to-Map pass (only in certain backends)
GPU Stream assignment pass (only in the cuda backend)

Followed by traversal that both emits code for memory copies, allocation/deallocation, scopes, tasklets, functions for certain scopes and nested SDFGs (where FPGA backends are even more complex), and every node. See docs/codegen/codegen.rst for more information.

We would like to use the Pass and Pipeline classes that DaCe provides to simplify the process. The goal is for passes to gradually add metadata to the SDFG elements and to the pipeline_results dictionary that pass pipelines provide, gradually lowering the SDFG to a more explicit SDFG (e.g., where copies become tasklets at the right scope, memory allocations/deallocations become tasklets, and Python or other language tasklets become their target language tasklet, i.e., C++/CUDA/HIP/OpenCL/RTL...), then to a list of SDFGs (one per generated code file), and finally to a GenerateCode simple traversal pass that emits the given code.

Lastly, the code generation pipeline is over-specialized right now and not well factored. The "CPU" code generation should actually be the "OpenMP" code generator, and the non-OpenMP code should move to "C++" code generation instead. Same goes for CUDA, which should be the GPU code generator.

To do that, the task list is:

Generate a design document by scouring the entire code generation subpackage and create a list of candidate passes that covers all possible behaviors
Construct an abstract pipeline in which all the passes connect to each other with maximal information reuse to improve performance.
The codegen subfolder needs to separate into codegen/compiler for compiler (cmake, etc.) interaction and codegen/passes for code generation-related passes. This should also allow the CMake backend to be replaced with direct compiler calls, which can be faster, and generation of other output languages that are not C++.

This issue only relates to the creation of the design document and plan, not the implementation thereof.

cc @acalotoiu @ThrudPrimrose @alexnick83 @phschaad

Jun 09 '25 16:06 tbennun

I will provide feedback by the 23th.

Jun 12 '25 14:06 acalotoiu