GPUCompiler.jl icon indicating copy to clipboard operation
GPUCompiler.jl copied to clipboard

Allow for deferred codegen in !toplevel

Open vchuravy opened this issue 1 year ago • 11 comments

Fixes https://github.com/EnzymeAD/Enzyme.jl/issues/1173

Need to come up with a standalone test-case.

vchuravy avatar Mar 26 '24 22:03 vchuravy

Could this get merged?

jgreener64 avatar Apr 18 '24 09:04 jgreener64

@vchuravy gentle bump

wsmoses avatar May 13 '24 20:05 wsmoses

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.45%. Comparing base (288b613) to head (0e00885).

:exclamation: Current head 0e00885 differs from pull request most recent head da0e767. Consider uploading reports for the commit da0e767 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
- Coverage   82.86%   73.45%   -9.41%     
==========================================
  Files          24       24              
  Lines        3361     3330      -31     
==========================================
- Hits         2785     2446     -339     
- Misses        576      884     +308     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 16 '24 01:05 codecov[bot]

I tried reproducing the Enzyme MWE with examples/jit.jl but the stars didn't align.

@maleadt do you recall why you limited this to toplevel originally? I assume because it wouldn't make sense for device side launched kernels.

vchuravy avatar May 16 '24 01:05 vchuravy

Hm... Doesn't the inner call resolve its own deferred codegen first and then both of them get linked into the outer one?

vchuravy avatar May 16 '24 12:05 vchuravy

Yes, but that doesn't correctly populate the jobs variable that's used at top level: https://github.com/JuliaGPU/GPUCompiler.jl/blob/288b613bc1b7e1c1d0ebd4274b5784d5d2ece86d/src/driver.jl#L423-L435 We could work around this by passing jobs to the inner deferred codegen generators, but that's a hack. Essentially the current system relies on a compilation job having a single point of entry, even into deferred jobs.

maleadt avatar May 16 '24 12:05 maleadt

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

works. I just pushed a commit that returns the job variable from the !toplevel compilation.

vchuravy avatar May 16 '24 14:05 vchuravy

Hm I am a bit confused how

https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L1002

We probably don't really rely on the finish_ir phase right now.

I'm not particularly happy with threading yet more state through, but if you need this... The compiler driver should be reworked to support compiling multiple translation units without relying on a single entry-point.

maleadt avatar May 16 '24 18:05 maleadt

Yeah I can try to improve this while working on #582

vchuravy avatar May 16 '24 19:05 vchuravy

CUDA.jl CI failure looks related.

maleadt avatar May 16 '24 20:05 maleadt

Ah I see for https://github.com/JuliaGPU/CUDA.jl/blob/c2d444b0f5a76f92c5ba6bc1534a53319218b563/test/core/execution.jl#L974-L975

We have an infinite loop since we don't pass through the list of already codegen'd functions. So we recurse into the original top-level and carry on from there.

vchuravy avatar May 20 '24 20:05 vchuravy