GPUArrays.jl `copyto!` of triangular of adjoint fails

julia> using CUDA, LinearAlgebra

julia> CUDA.allowscalar(false)

julia> U = cu(UpperTriangular(
           [1.0 0.0;
            0.0 1.0]
       ))
2×2 UpperTriangular{Float32, CuArray{Float32, 2}}:
 1.0  0.0
  ⋅   1.0

julia> copyto!(Σ, U')
ERROR: scalar getindex is disallowed
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/4n0iS/src/host/indexing.jl:62
  [3] getindex(::CuArray{Float32, 2}, ::Int64, ::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/4n0iS/src/host/indexing.jl:104
  [4] getindex
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/adjtrans.jl:203 [inlined]
  [5] getindex
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/triangular.jl:222 [inlined]
  [6] _getindex
    @ ./abstractarray.jl:1214 [inlined]
  [7] getindex
    @ ./abstractarray.jl:1170 [inlined]
  [8] iterate
    @ ./abstractarray.jl:1096 [inlined]
  [9] iterate
    @ ./abstractarray.jl:1094 [inlined]
 [10] copyto_unaliased!(deststyle::IndexLinear, dest::CuArray{Float32, 2}, srcstyle::IndexCartesian, src::LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}})
    @ Base ./abstractarray.jl:975
 [11] copyto!(dest::CuArray{Float32, 2}, src::LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}})
    @ Base ./abstractarray.jl:950
 [12] top-level scope
    @ REPL[11]:1
 [13] top-level scope
    @ ~/.julia/packages/CUDA/LTbUr/src/initialization.jl:81
    
julia> @which copyto!(Σ, U')
copyto!(dest::AbstractArray, src::AbstractArray) in Base at abstractarray.jl:947

Type is

julia> typeof(U')
LowerTriangular{Float32, Adjoint{Float32, CuArray{Float32, 2}}}

which means that it won't hit https://github.com/JuliaGPU/GPUArrays.jl/blob/8e1bde0ee96b3da889ba99af4caddbecedcfd0cc/src/host/linalg.jl#L70 because Adjoint is not a subtype of AbstractGPUArray, nor will it hit https://github.com/JuliaGPU/GPUArrays.jl/blob/8e1bde0ee96b3da889ba99af4caddbecedcfd0cc/src/host/linalg.jl#L70 since this is a LowerTriangular.

Similar issue of course also exists for UpperTriangular.

Hotfix would be to just add additional impl for UpperTriangular{T, <:Adjoint{T, <:AbstractGPUArray{T,N}}}, etc. but unclear to me if this is the best approach, e.g. maybe there's a more "generic" solution to these wrappers of wrappers arrays?

May 11 '21 10:05 torfjelde

Oh and if the hotfix is wanted, I can make a PR, np :+1:

May 11 '21 10:05 torfjelde

As tril! is implemented using a kernel, it's compatible with any GPU array type so we could use <:AnyGPUArray here. However, that's a costly union which might degrade the load time of this package, see https://github.com/JuliaGPU/CUDA.jl/pull/498. I guess this boils down to yet another instance of the need for https://github.com/JuliaLang/julia/pull/31563.

May 11 '21 14:05 maleadt

Using <:AnyGPUArray leads to method ambiguity due to https://github.com/JuliaGPU/GPUArrays.jl/blob/5c7e76a6676310905cd6df8e3b994116df57b0fb/src/host/abstractarray.jl#L95 and since AnyGPUArray already includes UpperTriangular :confused:

Compiler tells me that this can be fixed by defining for UpperTriangular{T, <:AbstractGPUArray{T, N}}, but this will also be an issue for the other types present in AnyGPUArray, right?

Maybe best to just go for the simple fix for this particular issue with Adjoint, given that L * L' is quite a common operation for triangular matrices, and then hopefullly https://github.com/JuliaLang/julia/pull/31563 gets some love in the near future?

May 16 '21 03:05 torfjelde

Hmm, maybe we should remove the support for copyto! to/from AnyGPUArray in GPUArrays, or at least when doing a CPU<->GPU transfer, which requires materializing the wrapper anyway (which is costly). For GPU<->GPU it still makes sense as we can use a kernel that uses the wrapper.

May 17 '21 06:05 maleadt

Hmm, maybe we should remove the support for copyto! to/from AnyGPUArray in GPUArrays, or at least when doing a CPU<->GPU transfer

I think that decision is best left up to you and the other GPU-peeps; I don't feel like I have enough knowledge/experience to be able to tell if that's a good or bad idea :upside_down_face:

Btw this is also relevant: https://github.com/JuliaArrays/ArrayInterface.jl/issues/136

May 18 '21 12:05 torfjelde

Hotfix would be to just add additional impl for UpperTriangular{T, <:Adjoint{T, <:AbstractGPUArray{T,N}}}, etc. but unclear to me if this is the best approach, e.g. maybe there's a more "generic" solution to these wrappers of wrappers arrays?

That seems the best approach right now. I could get rid of some of the AnyGPUArray copyto! implementations in GPUArrays, but that's a breaking change (e.g. calling Array on a wrapped GPU array triggers that copyto!) so I'd rather not touch that right now.

May 27 '21 13:05 maleadt

GPUArrays.jl GPUArrays.jl copied to clipboard

`copyto!` of triangular of adjoint fails

GPUArrays.jl
GPUArrays.jl copied to clipboard