dace Modular Code Generator: Complete Design Document

This PR provides a comprehensive design document for refactoring DaCe's code generation system from a monolithic structure into a modular, pass-based pipeline architecture using DaCe's existing Pass and Pipeline infrastructure.

Overview

The current code generation system is a complex monolithic subpackage that handles everything from analysis to code emission in a single traversal. This design document proposes breaking it down into discrete, composable passes that can be tested, verified, and extended independently.

Key Deliverables

1. Main Design Document (`doc/codegen/modular_codegen_design.md`)

Current System Analysis: Comprehensive survey of 48+ files in the codegen subpackage
17 Candidate Passes: Complete decomposition of monolithic behaviors into discrete passes:
- Phase 1 (Analysis): TypeInference, LibraryExpansion, MetadataCollection, AllocationAnalysis, ControlFlowAnalysis, TargetAnalysis
- Phase 2 (Transformation): CopyToMap, StreamAssignment, TaskletLanguageLowering
- Phase 3 (CodeGeneration): StateStructCreation, AllocationCode, MemletLowering, FrameCodeGeneration, TargetCodeGeneration, HeaderGeneration
- Phase 4 (FileGeneration): SDFGSplitting, CodeObjectCreation
Information Flow Schema: Structured pipeline_results dictionary for maximal information reuse
Target Refactoring Strategy: Split CPU→(C++ base + OpenMP extension), generalize CUDA→(GPU base + CUDA specifics)
New Organization: Separate codegen/compiler (build tools) from codegen/passes (generation passes)

2. Implementation Examples (`doc/codegen/pass_implementation_examples.md`)

Concrete Pass Implementations: Python code for key passes like MetadataCollectionPass, AllocationAnalysisPass, FrameCodeGenerationPass
Pipeline Configurations: Complete pipeline setups with conditional target-specific passes
Backward Compatibility: Wrappers preserving existing generate_code() API
Performance Strategies: Caching, incremental updates, lazy evaluation
Testing Framework: Unit test examples for individual passes and full pipelines

Benefits

Modularity: Each pass has a single responsibility and clear interfaces
Extensibility: Easy to add new passes or modify existing ones
Testability: Individual passes can be unit tested in isolation
Verifiability: Smaller, focused components are easier to verify
Performance: Information reuse between passes, incremental compilation
Maintainability: Clear separation of concerns and dependencies

Proposed Architecture

class CodeGenerationPipeline(Pipeline):
    def __init__(self):
        super().__init__([
            # Phase 1: Analysis
            TypeInferencePass(),
            MetadataCollectionPass(), 
            AllocationAnalysisPass(),
            TargetAnalysisPass(),
            
            # Phase 2: Transformations
            CopyToMapPass(),
            ConditionalPass(condition=is_gpu, pass=StreamAssignmentPass()),
            
            # Phase 3: Code Generation  
            FrameCodeGenerationPass(),
            TargetCodeGenerationPass(),
            
            # Phase 4: File Generation
            CodeObjectCreationPass(),
        ])

Target Refactoring

Current issues addressed:

"CPU" backend actually does OpenMP → Split into C++ base + OpenMP extension
"CUDA" backend is GPU-specific → Generalize to GPU base + CUDA/HIP specializations
Poor factoring between generic and specialized code

Proposed hierarchy:

TargetCodeGenerator
├── CppCodeGen → OpenMPCodeGen, MPICodeGen  
├── GPUCodeGen → CUDACodeGen, HIPCodeGen, OpenCLCodeGen
├── FPGACodeGen → XilinxCodeGen, IntelFPGACodeGen
└── SpecializedCodeGen → SVECodeGen, MLIRCodeGen

Implementation Roadmap

14-week timeline with incremental phases:

Weeks 1-2: Foundation and directory restructuring
Weeks 3-4: Analysis passes extraction
Weeks 5-6: Transformation passes implementation
Weeks 7-8: Code generation passes development
Weeks 9-10: Target refactoring and specialization
Weeks 11-12: Integration testing and performance validation
Weeks 13-14: Build system modernization and documentation

Backward Compatibility

The design maintains full backward compatibility:

Existing generate_code(sdfg) API unchanged
Legacy implementation preserved as fallback
Incremental migration path for custom backends
All current functionality preserved

This design document provides the complete foundation for transforming DaCe's code generation into a modern, modular, and extensible system while preserving all existing capabilities.

Fixes #2034.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Jun 09 '25 17:06 Copilot

Please do not merge until everyone has had a chance to comment, modify, and approve.

Jun 09 '25 17:06 tbennun

@copilot additionally, how would I create a new backend implementation (e.g., a new platform, or a different language like LLVM IR)? What would change in the design? Could you please add another document that explains how the modular code generation can be extended w.r.t. those two cases?

Jun 10 '25 04:06 tbennun

@copilot additionally, how would I create a new backend implementation (e.g., a new platform, or a different language like LLVM IR)? What would change in the design? Could you please add another docum...

I've added a comprehensive extension guide (doc/codegen/extending_modular_codegen.md) that explains how to create new backend implementations for both new platforms and different languages. The document includes:

For new platforms (e.g., neuromorphic processors):

Complete example showing target code generator implementation
Platform-specific transformation passes
Target registration and discovery
Integration with the pipeline

For new languages (e.g., LLVM IR):

Language-specific code generation
Code object handling for new languages
Analysis passes for language-specific optimizations
Multi-target orchestration

The guide demonstrates how the modular design makes extensions straightforward through clear interfaces and automatic registration, with comprehensive testing examples and best practices.

Jun 10 '25 04:06 Copilot

@copilot please address my latest review

Jun 10 '25 04:06 tbennun

@copilot please address my latest review

Addressed all the latest review comments including fixing pass headers, updating target hierarchy, replacing the neuromorphic example with a comprehensive OpenCL example, and removing the LLVM IR example. See commit 4f14e86.

Jun 10 '25 04:06 Copilot

Made my pass on the proposal (and removed the example, which does not make sense in the new construction of the codegen. We will have to discuss that separately). I'll wait for others to comment before manually editing the doc again.

@mcopik please comment on the design document before next Monday if you can (based on your recent experiences), so that we can ensure the design is sound and enables you to work in a simpler and more verified manner. Thanks!

Jun 10 '25 16:06 tbennun

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

Jun 10 '25 17:06 ThrudPrimrose

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

Completely agreed, that's proposed as AllocationPass right now. Any amendments you wish to add there @ThrudPrimrose ?

Jun 10 '25 17:06 tbennun

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

Completely agreed, that's proposed as AllocationPass right now. Any amendments you wish to add there @ThrudPrimrose ?

Not right now, but I want to discuss some ideas in the dace meeting.

Jun 10 '25 20:06 ThrudPrimrose

@tbennun Few simple comments (might be naive):

This sentence makes no sense to me "CUDA backend is GPU-specific, not general GPU"
I'd add to the issue the incredible entanglement of the CPU and CPP backend. It is very difficult to say where one ends and the other begins.
I wrote initially, "If we do not want to be restricted to a CUDA model, and we want to support SYCL (Intel GPUs), then maybe it would make sense to have a GPU backend + a CUDA backend that inherits from it? Alternatively, let's simply refer to the GPU backend as "CUDA" (since HIP ultimately implements almost the same API as CUDA)." This was based on proposed refactoring, and I only learned that we want to provide such a split after reading "Target Hierarchy". It feels like the proposed refactoring includes only a subset of the changes we want to introduce.
The timeline includes the step "2. Rename CUDA backend to GPU" - shouldn't it be a full refactoring?

Overall, it makes sense to me. I would add as an explicit goal to have a template/blueprint for adding a new target.

Jun 11 '25 17:06 mcopik

@mcopik I think it makes sense and I also like the SYCL example. Thanks!

Jun 11 '25 17:06 tbennun

I consider the GPUStream Assignment to be a scheduling transformation, just like the GPUOfffloading / ToGPU transformation. I think these transformations should be performed before codegen-preprocessing passes are executed. Is the document only for code-generation? Do we combine it with the transformation that should have been applied before the GPU offloading?

Just like the scheduling passes and I think allocation pass should be a pass that is performed before codegen (and codegen preprocessing passes), as these transformations are not codegen-specific. Many accelerator environments support offloading and stream assignment (Let's say sycl for Intel GPUs, also supports streams).

I think the following transformations can be the new Phase 1:

AllocationPass
CopyToMapPass
StreamAssignmentPass (Should not necessarily be hardcoded for GPUs, many programming environments support streams)

Phase 0 would involve transformations that make the SDFG more fit for the target accelerator. Consider the following: You have this pattern: Map[Range(1..N)] -> Tasklet -> Map[Range(1..N)], if possible duplicating this tasklet across the maps and fusing them would make sense for GPUs but not for CPUs, but this transformation is not GPU specific. One could do it on an SDFG that is not specialized for GPUs yet.

Then we would have Phase 2 (old phase 1): for the analysis and set-up for codegen

And the new phase 3 (old phase 2) : that does to accelerator specific lowering pass that depend on the language (sycl, cuda etc.)

Btw. I have a design document for the Offloading pass (not restricted to GPU) and the order of preparation transformations could look like before offloading. Would it make sense to integrate it to dis document? Link to google doc: https://docs.google.com/document/d/13Pl4A8u5YJgTkZvhKCoJp8t693tvHeEYXfUvDtyonDI/edit?usp=sharing

Jun 16 '25 10:06 ThrudPrimrose

@tbennun Few simple comments (might be naive):

This sentence makes no sense to me "CUDA backend is GPU-specific, not general GPU"

I'd add to the issue the incredible entanglement of the CPU and ...

Addressed your feedback including: clarifying CUDA/GPU backend language and scope of refactoring, adding mention of CPU/C++ backend entanglement issue, changing timeline to indicate full refactoring rather than just renaming, and adding explicit goal for template/blueprint for new targets. See commit 66be6e8.

I consider the GPUStream Assignment to be a scheduling transformation, just like the GPUOfffloading / ToGPU transformation. I think these transformations should be performed before codegen-preprocessi...

Excellent points about the scope of transformations vs. codegen-specific passes. You're right that AllocationPass, CopyToMapPass, and StreamAssignmentPass could be viewed as pre-codegen transformations rather than codegen passes. This touches on a fundamental architectural question about where the boundary lies between SDFG transformation and code generation. Your proposed phase structure makes sense and aligns with having clear separation of concerns. I'd suggest this becomes part of the broader discussion about the design scope in the upcoming DaCe meeting.

Jun 16 '25 11:06 Copilot

All comments addressed in #2071. The work on this PR is superseded by and cherry-picked into that PR.

Jul 07 '25 08:07 tbennun

dace dace copied to clipboard

Modular Code Generator: Complete Design Document

Overview

Key Deliverables

1. Main Design Document (doc/codegen/modular_codegen_design.md)

2. Implementation Examples (doc/codegen/pass_implementation_examples.md)

Benefits

Proposed Architecture

Target Refactoring

Implementation Roadmap

Backward Compatibility

dace
dace copied to clipboard

1. Main Design Document (`doc/codegen/modular_codegen_design.md`)

2. Implementation Examples (`doc/codegen/pass_implementation_examples.md`)