GPUArrays.jl
GPUArrays.jl copied to clipboard
`copyto!` of triangular of adjoint fails
julia> using CUDA, LinearAlgebra
julia> CUDA.allowscalar(false)
julia> U = cu(UpperTriangular(
[1.0 0.0;
0.0 1.0]
))
2×2 UpperTriangular{Float32, CuArray{Float32, 2}}:
1.0 0.0
⋅ 1.0
julia> copyto!(Σ, U')
ERROR: scalar getindex is disallowed
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] assertscalar(op::String)
@ GPUArrays ~/.julia/packages/GPUArrays/4n0iS/src/host/indexing.jl:62
[3] getindex(::CuArray{Float32, 2}, ::Int64, ::Int64)
@ GPUArrays ~/.julia/packages/GPUArrays/4n0iS/src/host/indexing.jl:104
[4] getindex
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/adjtrans.jl:203 [inlined]
[5] getindex
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/triangular.jl:222 [inlined]
[6] _getindex
@ ./abstractarray.jl:1214 [inlined]
[7] getindex
@ ./abstractarray.jl:1170 [inlined]
[8] iterate
@ ./abstractarray.jl:1096 [inlined]
[9] iterate
@ ./abstractarray.jl:1094 [inlined]
[10] copyto_unaliased!(deststyle::IndexLinear, dest::CuArray{Float32, 2}, srcstyle::IndexCartesian, src::LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}})
@ Base ./abstractarray.jl:975
[11] copyto!(dest::CuArray{Float32, 2}, src::LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}})
@ Base ./abstractarray.jl:950
[12] top-level scope
@ REPL[11]:1
[13] top-level scope
@ ~/.julia/packages/CUDA/LTbUr/src/initialization.jl:81
julia> @which copyto!(Σ, U')
copyto!(dest::AbstractArray, src::AbstractArray) in Base at abstractarray.jl:947
Type is
julia> typeof(U')
LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}}
which means that it won't hit https://github.com/JuliaGPU/GPUArrays.jl/blob/8e1bde0ee96b3da889ba99af4caddbecedcfd0cc/src/host/linalg.jl#L70 because Adjoint is not a subtype of AbstractGPUArray, nor will it hit https://github.com/JuliaGPU/GPUArrays.jl/blob/8e1bde0ee96b3da889ba99af4caddbecedcfd0cc/src/host/linalg.jl#L70 since this is a LowerTriangular.
Similar issue of course also exists for UpperTriangular.
Hotfix would be to just add additional impl for UpperTriangular{T, <:Adjoint{T, <:AbstractGPUArray{T,N}}}, etc. but unclear to me if this is the best approach, e.g. maybe there's a more "generic" solution to these wrappers of wrappers arrays?
Oh and if the hotfix is wanted, I can make a PR, np :+1:
As tril! is implemented using a kernel, it's compatible with any GPU array type so we could use <:AnyGPUArray here. However, that's a costly union which might degrade the load time of this package, see https://github.com/JuliaGPU/CUDA.jl/pull/498. I guess this boils down to yet another instance of the need for https://github.com/JuliaLang/julia/pull/31563.
Using <:AnyGPUArray leads to method ambiguity due to https://github.com/JuliaGPU/GPUArrays.jl/blob/5c7e76a6676310905cd6df8e3b994116df57b0fb/src/host/abstractarray.jl#L95 and since AnyGPUArray already includes UpperTriangular :confused:
Compiler tells me that this can be fixed by defining for UpperTriangular{T, <:AbstractGPUArray{T, N}}, but this will also be an issue for the other types present in AnyGPUArray, right?
Maybe best to just go for the simple fix for this particular issue with Adjoint, given that L * L' is quite a common operation for triangular matrices, and then hopefullly https://github.com/JuliaLang/julia/pull/31563 gets some love in the near future?
Hmm, maybe we should remove the support for copyto! to/from AnyGPUArray in GPUArrays, or at least when doing a CPU<->GPU transfer, which requires materializing the wrapper anyway (which is costly). For GPU<->GPU it still makes sense as we can use a kernel that uses the wrapper.
Hmm, maybe we should remove the support for copyto! to/from AnyGPUArray in GPUArrays, or at least when doing a CPU<->GPU transfer
I think that decision is best left up to you and the other GPU-peeps; I don't feel like I have enough knowledge/experience to be able to tell if that's a good or bad idea :upside_down_face:
Btw this is also relevant: https://github.com/JuliaArrays/ArrayInterface.jl/issues/136
Hotfix would be to just add additional impl for
UpperTriangular{T, <:Adjoint{T, <:AbstractGPUArray{T,N}}}, etc. but unclear to me if this is the best approach, e.g. maybe there's a more "generic" solution to these wrappers of wrappers arrays?
That seems the best approach right now. I could get rid of some of the AnyGPUArray copyto! implementations in GPUArrays, but that's a breaking change (e.g. calling Array on a wrapped GPU array triggers that copyto!) so I'd rather not touch that right now.