AMDGPU.jl icon indicating copy to clipboard operation
AMDGPU.jl copied to clipboard

CI failure on Julia 1.12 with GPUArrays linalg

Open luraess opened this issue 7 months ago • 1 comments

In kron as e.g. https://buildkite.com/julialang/amdgpu-dot-jl/builds/3108#0196e781-05c0-480e-bd4f-81404b2e49e5/510-941

and ComplexF16 as in here https://buildkite.com/julialang/amdgpu-dot-jl/builds/3108#0196e781-05c0-480e-bd4f-81404b2e49e5/510-1000

And possibly a few more

luraess avatar May 19 '25 09:05 luraess

The kron one is really weird. Only happens for ComplexF64 and if the first argument is transposed (adjoint actually is fine and running it with adjoint first fixes the issue):

julia> using AMDGPU, Adapt, LinearAlgebra

julia> a, b = transpose(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 + 0.0im; 2.0 + 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])

julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.0+0.0im  4.0+0.0im
 6.0+0.0im  8.0+0.0im

julia> a, b = adjoint(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 - 0.0im; 2.0 - 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])

julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 3.0+0.0im  4.0+0.0im
 6.0+0.0im  8.0+0.0im

julia> a, b = transpose(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 + 0.0im; 2.0 + 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])

julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 3.0+0.0im  4.0+0.0im
 6.0+0.0im  8.0+0.0im

That is despite the fact @device_code_gcn kron(adapt(ROCArray, a), adapt(ROCArray, b)) is identical except for function names. The implementation of kron in https://github.com/JuliaGPU/GPUArrays.jl/blob/602976ff7c06b8c26d2a672cbc269df15a1d3b5c/src/host/linalg.jl#L774 is kind of weird though, as all the checks for transpose and adjoint happen at runtime and there is some boxing going on due to self-recursive closures. Could that boxing be causing the issue?

simeonschaub avatar Jun 16 '25 16:06 simeonschaub

Should be fixed by JuliaLang/julia#58837

simeonschaub avatar Jul 01 '25 15:07 simeonschaub

Should be fixed by JuliaLang/julia#58837

Will this make it to 1.12 release?

luraess avatar Jul 02 '25 08:07 luraess

Yes, it's marked for backporting

simeonschaub avatar Jul 02 '25 08:07 simeonschaub

Can this issue be closed? CI is now passing on 1.12 rc1

simeonschaub avatar Jul 15 '25 13:07 simeonschaub

Yeah, we can close it

pxl-th avatar Jul 15 '25 14:07 pxl-th