CI failure on Julia 1.12 with GPUArrays linalg
In kron as e.g. https://buildkite.com/julialang/amdgpu-dot-jl/builds/3108#0196e781-05c0-480e-bd4f-81404b2e49e5/510-941
and ComplexF16 as in here https://buildkite.com/julialang/amdgpu-dot-jl/builds/3108#0196e781-05c0-480e-bd4f-81404b2e49e5/510-1000
And possibly a few more
The kron one is really weird. Only happens for ComplexF64 and if the first argument is transposed (adjoint actually is fine and running it with adjoint first fixes the issue):
julia> using AMDGPU, Adapt, LinearAlgebra
julia> a, b = transpose(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 + 0.0im; 2.0 + 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])
julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.0+0.0im 4.0+0.0im
6.0+0.0im 8.0+0.0im
julia> a, b = adjoint(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 - 0.0im; 2.0 - 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])
julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
3.0+0.0im 4.0+0.0im
6.0+0.0im 8.0+0.0im
julia> a, b = transpose(ComplexF64[1;; 2]), ComplexF64[3;; 4]
(ComplexF64[1.0 + 0.0im; 2.0 + 0.0im;;], ComplexF64[3.0 + 0.0im 4.0 + 0.0im])
julia> kron(adapt(ROCArray, a), adapt(ROCArray, b))
2×2 ROCArray{ComplexF64, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
3.0+0.0im 4.0+0.0im
6.0+0.0im 8.0+0.0im
That is despite the fact @device_code_gcn kron(adapt(ROCArray, a), adapt(ROCArray, b)) is identical except for function names. The implementation of kron in https://github.com/JuliaGPU/GPUArrays.jl/blob/602976ff7c06b8c26d2a672cbc269df15a1d3b5c/src/host/linalg.jl#L774 is kind of weird though, as all the checks for transpose and adjoint happen at runtime and there is some boxing going on due to self-recursive closures. Could that boxing be causing the issue?
Should be fixed by JuliaLang/julia#58837
Yes, it's marked for backporting
Can this issue be closed? CI is now passing on 1.12 rc1
Yeah, we can close it