Allow for deferred codegen in !toplevel
Fixes https://github.com/EnzymeAD/Enzyme.jl/issues/1173
Need to come up with a standalone test-case.
Could this get merged?
@vchuravy gentle bump
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 73.45%. Comparing base (
288b613) to head (0e00885).
:exclamation: Current head 0e00885 differs from pull request most recent head da0e767. Consider uploading reports for the commit da0e767 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #556 +/- ##
==========================================
- Coverage 82.86% 73.45% -9.41%
==========================================
Files 24 24
Lines 3361 3330 -31
==========================================
- Hits 2785 2446 -339
- Misses 576 884 +308
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I tried reproducing the Enzyme MWE with examples/jit.jl but the stars didn't align.
@maleadt do you recall why you limited this to toplevel originally? I assume because it wouldn't make sense for device side launched kernels.
Hm... Doesn't the inner call resolve its own deferred codegen first and then both of them get linked into the outer one?
Yes, but that doesn't correctly populate the jobs variable that's used at top level: https://github.com/JuliaGPU/GPUCompiler.jl/blob/288b613bc1b7e1c1d0ebd4274b5784d5d2ece86d/src/driver.jl#L423-L435
We could work around this by passing jobs to the inner deferred codegen generators, but that's a hack. Essentially the current system relies on a compilation job having a single point of entry, even into deferred jobs.
Hm I am a bit confused how
https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002
works. I just pushed a commit that returns the job variable from the !toplevel compilation.
Hm I am a bit confused how
https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002
We probably don't really rely on the finish_ir phase right now.
I'm not particularly happy with threading yet more state through, but if you need this... The compiler driver should be reworked to support compiling multiple translation units without relying on a single entry-point.
Yeah I can try to improve this while working on #582
CUDA.jl CI failure looks related.
Ah I see for https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L975
We have an infinite loop since we don't pass through the list of already codegen'd functions. So we recurse into the original top-level and carry on from there.