AMDGPU.jl InvalidIRError in a constructor

Very similar to my bug report in https://github.com/JuliaGPU/AMDGPU.jl/issues/846, it looks like something is not being constant-folded. Again, it works with CUDA.jl, but fails with AMDGPU.jl.

MWE:

using QEDbase
using QEDbase.Mocks
using KernelAbstractions
using Random

@kernel function mwe_kernel(dest::AbstractVector)
    id = @index(Global)
    dest[id] = zero(eltype(dest))
end

RNG = MersenneTwister(137137)

# works ->
using CUDA
moms = CuVector([Mocks._rand_momenta(RNG, 1, MockMomentum{Float32})[1] for _ in 1:128])
mwe_kernel(get_backend(moms))(moms; ndrange = length(moms))
KernelAbstractions.synchronize(get_backend(moms))

@info "CUDA Success"

# crashes ->
using AMDGPU
moms = ROCVector([Mocks._rand_momenta(RNG, 1, MockMomentum{Float32})[1] for _ in 1:128])
mwe_kernel(get_backend(moms))(moms; ndrange = length(moms))
KernelAbstractions.synchronize(get_backend(moms))

@info "AMDGPU Success"

What I'm getting for AMDGPU is this:

ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mwe_kernel(::KernelAbstractions.CompilerMetadata{…}, ::AMDGPU.Device.ROCDeviceVector{…}) resulted in invalid LLVM IR
Reason: unsupported call to an external C function (call to jl_string_to_genericmemory)
Reason: unsupported call to an external C function (call to jl_genericmemory_to_string)
Reason: unsupported call to an external C function (call to ijl_pchar_to_string)
Reason: unsupported call to an external C function (call to ijl_rethrow)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erroneous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/validation.jl:167
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:417 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Tracy/tYwAE/src/tracepoint.jl:163 [inlined]
  [4] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:416
  [5] emit_llvm
    @ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:182 [inlined]
  [6] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:95
  [7] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:80 [inlined]
  [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:67
  [9] compile
    @ ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:55 [inlined]
 [10] #hipcompile##0
    @ ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:211 [inlined]
 [11] JuliaContext(f::AMDGPU.Compiler.var"#hipcompile##0#hipcompile##1"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:34
 [12] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/driver.jl:25
 [13] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:210
 [14] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/execution.jl:245
 [15] cached_compilation(cache::Dict{Any, AMDGPU.HIP.HIPFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wvn1Y/src/execution.jl:159
 [16] macro expansion
    @ ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:166 [inlined]
 [17] macro expansion
    @ ./lock.jl:376 [inlined]
 [18] hipfunction(f::typeof(gpu_mwe_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, AMDGPU.Device.ROCDeviceVector{…}}}; kwargs::@Kwargs{})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:160
 [19] hipfunction(f::typeof(gpu_mwe_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, AMDGPU.Device.ROCDeviceVector{…}}})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/TqRG0/src/compiler/codegen.jl:159
 [20] macro expansion
    @ ~/.julia/packages/AMDGPU/TqRG0/src/highlevel.jl:155 [inlined]
 [21] (::KernelAbstractions.Kernel{…})(args::ROCArray{…}; ndrange::Int64, workgroupsize::Nothing)
    @ AMDGPU.ROCKernels ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:96
 [22] top-level scope
    @ ~/repos/QEDbase.jl/temp.jl:24
 [23] include(mapexpr::Function, mod::Module, _path::String)
    @ Base ./Base.jl:307
 [24] top-level scope
    @ REPL[3]:1
in expression starting at /home/reinha57/repos/QEDbase.jl/temp.jl:24
Some type information was truncated. Use `show(err)` to see complete types.

The implementation of the zero(::Type) function is the following:

Base.zero(mom_type::Type{<:AbstractMockMomentum}) = mom_type(zeros(eltype(mom_type), 4))

A workaround is to change this to

function Base.zero(mom_type::Type{T}) where {EL_T, T <: AbstractMockMomentum{EL_T}}
    return mom_type(zero(EL_T), zero(EL_T), zero(EL_T), zero(EL_T))
end

which works with both backends.

Nov 15 '25 14:11 AntonReinhard

Wait, doesn't zeros(eltype(mom_type), 4)) create a dynamically allocated Array? I am surprised this works in CUDA, perhaps LLVM is able to elide the allocation entirely?

Nov 15 '25 15:11 simeonschaub

Yes it is a dynamic Vector, but I'm guessing with CUDA it's elided directly into the SVector constructor. Is CUDA.jl not using the GPUCompilers.jl like AMDGPU.jl is?

Nov 15 '25 15:11 AntonReinhard

It is, though the CUDA backend is more mature and has features we don't support (yet). Could just be that we're lacking some quirks around bounds errors causing the array to get captured, I'll look into it

Nov 15 '25 15:11 simeonschaub

What version of Julia is this on? On 1.12 zeros(Float32, 4) is lowered to alloca (aka allocate on stack) and is working on AMDGPU.

Dec 05 '25 13:12 pxl-th

Originally I think I used 1.12.1, but I also just reproduced it on 1.12.2. The error doesn't seem to be the zeros(Float32, 4) call itself anyway, but rather some sort of bounds(?) check it's trying to do that it shouldn't need to because all the sizes are static. The error isn't an unsupported call to dynamic allocation, it's some rethrow and string stuff.

Edit: The mwe doesn't work anymore right now (i.e. it runs fine and doesn't reproduce the error) because we added the workaround and released the package since then.

Dec 05 '25 15:12 AntonReinhard