GPUCompiler.jl icon indicating copy to clipboard operation
GPUCompiler.jl copied to clipboard

Opaque pointers support

Open pxl-th opened this issue 2 years ago • 5 comments

ROCm 5.5+ uses LLVM 16 and opaque pointers, which leads to issues like:

julia> x = ROCArray{Float32}(undef, 16);

julia> fill!(x, 0f0)
error: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM16.0.0git' Reader: 'LLVM 15.0.7jl')

And launching Julia (1.10 in this case) with JULIA_LLVM_ARGS="--opaque-pointers" results in:

julia> using AMDGPU

julia> x = ROCArray{Float32}(undef, 16);

julia> fill!(x, 0f0)
ERROR: Taking the type of an opaque pointer is illegal
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] eltype(typ::LLVM.PointerType)
    @ LLVM ~/.julia/packages/LLVM/lq6lJ/src/core/type.jl:167
  [3] classify_arguments(job::GPUCompiler.CompilerJob, codegen_ft::LLVM.FunctionType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/irgen.jl:384
  [4] macro expansion
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/irgen.jl:86 [inlined]
  [5] macro expansion
    @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [6] irgen(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/irgen.jl:82
  [7] macro expansion
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:202 [inlined]
  [8] macro expansion
    @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [9] macro expansion
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:201 [inlined]
 [10] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/utils.jl:89
 [11] emit_llvm
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/utils.jl:83 [inlined]
 [12] 
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:129
 [13] codegen
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:110 [inlined]
 [14] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:106
 [15] compile
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:98 [inlined]
 [16] #37
    @ GPUCompiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:122 [inlined]
 [17] JuliaContext(f::AMDGPU.Compiler.var"#37#38"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/driver.jl:47
 [18] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:121
 [19] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/execution.jl:125
 [20] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwKB0/src/execution.jl:103
 [21] macro expansion
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:91 [inlined]
 [22] macro expansion
    @ AMDGPU.Compiler ./lock.jl:267 [inlined]
 [23] hipfunction(f::GPUArrays.var"#6#7", tt::Type{Tuple{AMDGPU.ROCKernelContext, AMDGPU.Device.ROCDeviceVector{Float32, 1}, Float32}}; kwargs::@Kwargs{name::Nothing})
    @ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:85
 [24] hipfunction
    @ GPUArrays ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:84 [inlined]
 [25] macro expansion
    @ GPUArrays ~/.julia/dev/AMDGPU/src/highlevel.jl:159 [inlined]
 [26] #gpu_call#58
    @ GPUArrays ~/.julia/dev/AMDGPU/src/gpuarrays.jl:9 [inlined]
 [27] gpu_call
    @ GPUArrays ~/.julia/dev/AMDGPU/src/gpuarrays.jl:5 [inlined]
 [28] gpu_call(::GPUArrays.var"#6#7", ::ROCArray{…}, ::Float32; target::ROCArray{…}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/EZkix/src/device/execution.jl:65
 [29] gpu_call
    @ GPUArrays ~/.julia/packages/GPUArrays/EZkix/src/device/execution.jl:34 [inlined]
 [30] fill!(A::ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}, x::Float32)
    @ GPUArrays ~/.julia/packages/GPUArrays/EZkix/src/host/construction.jl:14
 [31] top-level scope
    @ REPL[3]:1

CC @jpsamaroo

pxl-th avatar Sep 11 '23 19:09 pxl-th

Generally we won't be able to read LLVM IR being produced by a newer version of LLVM. This is why I said earlier this summer that I believe for a long-term solution you will need to generate the bitcode archive with multiple LLVM versions.

You can use LLVM.supports_typed_pointers(ctx) == false to see if a context supports opaque pointers, and run with JULIA_LLVM_ARGS="--opaque-pointers" but YMMV. We haven't turned on opaque pointers on Julia master yet since we found performance regression and no one had time to investigate those.

I think the first step here would be to add a CI job that runs with --opaque-pointers and go through the code-base and use eltype only conditionally.

vchuravy avatar Sep 11 '23 20:09 vchuravy

LLVM.jl already tests with --opaque-pointers and it has the opaque pointer infrastructure in place, though I'm not sure we fixed everything downstream.

gbaraldi avatar Sep 13 '23 21:09 gbaraldi

Generally we won't be able to read LLVM IR being produced by a newer version of LLVM.

I've tried a simpler case with a kernel without any arguments to avoid calling eltype:

f() = return
@roc f()

And it fails with error: Unknown attribute kind (86) (Producer: 'LLVM16.0.0git' Reader: 'LLVM 15.0.7jl') during linking of device libraries. I take it this confirms it?

I was using Julia 1.9 (LLVM 14) with ROCm 5.4 (LLVM 15) without any issues, so I was hoping LLVM 15 & LLVM 16 would also work :/

This is why I said earlier this summer that I believe for a long-term solution you will need to generate the bitcode archive with multiple LLVM versions.

That might indeed help, I just had low motivation to build ROCm libraries with BinaryBuilder again. The only question is, will it work with devices support for which was added in newer LLVM versions, since we are building devlibs with older LLVM versions (and other features like FP16 atomics).


Will Julia 1.11 use LLVM 16? Currently the lack of ROCm 5.5+ support prevents us from supporting Navi 3 and Windows.

pxl-th avatar Sep 15 '23 11:09 pxl-th

Will Julia 1.11 use LLVM 16?

Maybe? It heavily depends on the bandwidth folks have (I currently can't work on it) and @gbaraldi is busy with a lot of things.

vchuravy avatar Sep 15 '23 13:09 vchuravy

I would like for that to be the case. And to me the most annoying thing is getting the BinaryBuilder build working. And as always 32bit and windows are the holdups. Though in the case of windows I'm concerned because we've hit the symbol cap and there doesn't seem to be a clear solution

gbaraldi avatar Sep 15 '23 13:09 gbaraldi