InvalidIRError in a constructor
Very similar to my bug report in https://github.com/JuliaGPU/AMDGPU.jl/issues/846, it looks like something is not being constant-folded. Again, it works with CUDA.jl, but fails with AMDGPU.jl.
MWE:
using QEDbase
using QEDbase.Mocks
using KernelAbstractions
using Random
@kernel function mwe_kernel(dest::AbstractVector)
id = @index(Global)
dest[id] = zero(eltype(dest))
end
RNG = MersenneTwister(137137)
# works ->
using CUDA
moms = CuVector([Mocks._rand_momenta(RNG, 1, MockMomentum{Float32})[1] for _ in 1:128])
mwe_kernel(get_backend(moms))(moms; ndrange = length(moms))
KernelAbstractions.synchronize(get_backend(moms))
@info "CUDA Success"
# crashes ->
using AMDGPU
moms = ROCVector([Mocks._rand_momenta(RNG, 1, MockMomentum{Float32})[1] for _ in 1:128])
mwe_kernel(get_backend(moms))(moms; ndrange = length(moms))
KernelAbstractions.synchronize(get_backend(moms))
@info "AMDGPU Success"
What I'm getting for AMDGPU is this:
ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mwe_kernel(::KernelAbstractions.CompilerMetadata{…}, ::AMDGPU.Device.ROCDeviceVector{…}) resulted in invalid LLVM IR
Reason: unsupported call to an external C function (call to jl_string_to_genericmemory)
Reason: unsupported call to an external C function (call to jl_genericmemory_to_string)
Reason: unsupported call to an external C function (call to ijl_pchar_to_string)
Reason: unsupported call to an external C function (call to ijl_rethrow)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erroneous code with Cthulhu.jl
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/validation.jl:167
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:417 [inlined]
[3] macro expansion
@ ~/.julia/packages/Tracy/tYwAE/src/tracepoint.jl:163 [inlined]
[4] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:416
[5] emit_llvm
@ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:182 [inlined]
[6] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:95
[7] compile_unhooked
@ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:80 [inlined]
[8] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:67
[9] compile
@ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:55 [inlined]
[10] #hipcompile##0
@ ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:211 [inlined]
[11] JuliaContext(f::AMDGPU.Compiler.var"#hipcompile##0#hipcompile##1"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:34
[12] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:25
[13] hipcompile(job::GPUCompiler.CompilerJob)
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:210
[14] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/execution.jl:245
[15] cached_compilation(cache::Dict{Any, AMDGPU.HIP.HIPFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/execution.jl:159
[16] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:166 [inlined]
[17] macro expansion
@ ./lock.jl:376 [inlined]
[18] hipfunction(f::typeof(gpu_mwe_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, AMDGPU.Device.ROCDeviceVector{…}}}; kwargs::@Kwargs{})
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:160
[19] hipfunction(f::typeof(gpu_mwe_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, AMDGPU.Device.ROCDeviceVector{…}}})
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:159
[20] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/highlevel.jl:155 [inlined]
[21] (::KernelAbstractions.Kernel{…})(args::ROCArray{…}; ndrange::Int64, workgroupsize::Nothing)
@ AMDGPU.ROCKernels ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:96
[22] top-level scope
@ ~/repos/QEDbase.jl/temp.jl:24
[23] include(mapexpr::Function, mod::Module, _path::String)
@ Base ./Base.jl:307
[24] top-level scope
@ REPL[3]:1
in expression starting at /home/reinha57/repos/QEDbase.jl/temp.jl:24
Some type information was truncated. Use `show(err)` to see complete types.
The implementation of the zero(::Type) function is the following:
Base.zero(mom_type::Type{<:AbstractMockMomentum}) = mom_type(zeros(eltype(mom_type), 4))
A workaround is to change this to
function Base.zero(mom_type::Type{T}) where {EL_T, T <: AbstractMockMomentum{EL_T}}
return mom_type(zero(EL_T), zero(EL_T), zero(EL_T), zero(EL_T))
end
which works with both backends.
Wait, doesn't zeros(eltype(mom_type), 4)) create a dynamically allocated Array? I am surprised this works in CUDA, perhaps LLVM is able to elide the allocation entirely?
Yes it is a dynamic Vector, but I'm guessing with CUDA it's elided directly into the SVector constructor. Is CUDA.jl not using the GPUCompilers.jl like AMDGPU.jl is?
It is, though the CUDA backend is more mature and has features we don't support (yet). Could just be that we're lacking some quirks around bounds errors causing the array to get captured, I'll look into it
What version of Julia is this on? On 1.12 zeros(Float32, 4) is lowered to alloca (aka allocate on stack) and is working on AMDGPU.
Originally I think I used 1.12.1, but I also just reproduced it on 1.12.2.
The error doesn't seem to be the zeros(Float32, 4) call itself anyway, but rather some sort of bounds(?) check it's trying to do that it shouldn't need to because all the sizes are static. The error isn't an unsupported call to dynamic allocation, it's some rethrow and string stuff.
Edit: The mwe doesn't work anymore right now (i.e. it runs fine and doesn't reproduce the error) because we added the workaround and released the package since then.