GPUCompiler.jl Allow for deferred codegen in !toplevel

Fixes https://github.com/EnzymeAD/Enzyme.jl/issues/1173

Need to come up with a standalone test-case.

Mar 26 '24 22:03 vchuravy

Could this get merged?

Apr 18 '24 09:04 jgreener64

@vchuravy gentle bump

May 13 '24 20:05 wsmoses

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.45%. Comparing base (288b613) to head (0e00885).

:exclamation: Current head 0e00885 differs from pull request most recent head da0e767. Consider uploading reports for the commit da0e767 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
- Coverage   82.86%   73.45%   -9.41%     
==========================================
  Files          24       24              
  Lines        3361     3330      -31     
==========================================
- Hits         2785     2446     -339     
- Misses        576      884     +308

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 16 '24 01:05 codecov[bot]

I tried reproducing the Enzyme MWE with examples/jit.jl but the stars didn't align.

@maleadt do you recall why you limited this to toplevel originally? I assume because it wouldn't make sense for device side launched kernels.

May 16 '24 01:05 vchuravy

Hm... Doesn't the inner call resolve its own deferred codegen first and then both of them get linked into the outer one?

May 16 '24 12:05 vchuravy

Yes, but that doesn't correctly populate the jobs variable that's used at top level: https://github.com/JuliaGPU/GPUCompiler.jl/blob/288b613bc1b7e1c1d0ebd4274b5784d5d2ece86d/src/driver.jl#L423-L435 We could work around this by passing jobs to the inner deferred codegen generators, but that's a hack. Essentially the current system relies on a compilation job having a single point of entry, even into deferred jobs.

May 16 '24 12:05 maleadt

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

works. I just pushed a commit that returns the job variable from the !toplevel compilation.

May 16 '24 14:05 vchuravy

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

We probably don't really rely on the finish_ir phase right now.

I'm not particularly happy with threading yet more state through, but if you need this... The compiler driver should be reworked to support compiling multiple translation units without relying on a single entry-point.

May 16 '24 18:05 maleadt

Yeah I can try to improve this while working on #582

May 16 '24 19:05 vchuravy

CUDA.jl CI failure looks related.

May 16 '24 20:05 maleadt

Ah I see for https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L975

We have an infinite loop since we don't pass through the list of already codegen'd functions. So we recurse into the original top-level and carry on from there.

May 20 '24 20:05 vchuravy