AMDGPU.jl
AMDGPU.jl copied to clipboard
2-`norm` for views of ROCArray falls back to scalar indexing
This is related to https://github.com/JuliaGPU/CUDA.jl/issues/2280
When taking the 2-norm (or any p-norm apart from 1- and Inf-norm) of a CuArray view, the implementation errors due to scalar iteration. Interestingly, 1-norm and Inf-norm don't result in an error.
The Minimal Working Example (MWE) for this bug:
julia> AMDGPU.allowscalar(false)
julia> F = AMDGPU.rand(Float64, 10, 10);
julia> using LinearAlgebra
julia> norm(F, 1)
48.031221724616245
julia> norm(F, 1)
48.031221724616245
julia> norm(F, Inf)
0.9953361044936007
julia> norm(F, 2)
5.651939527772477
julia> F_v = @view F[2:end-1,2:end-1];
julia> norm(F_v, 1)
31.142444509460844
julia> norm(F_v, Inf)
0.9953361044936007
julia> norm(F_v, 2)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:48 [inlined]
[6] scalar_getindex(::ROCArray{Float64, 2, AMDGPU.Runtime.Mem.HIPBuffer}, ::Int64, ::Vararg{Int64})
@ GPUArrays ~/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:34
[7] _getindex
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:17 [inlined]
[8] getindex
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:15 [inlined]
[9] getindex
@ ./subarray.jl:290 [inlined]
[10] _getindex
@ ./abstractarray.jl:1341 [inlined]
[11] getindex
@ ./abstractarray.jl:1291 [inlined]
[12] iterate
@ ./abstractarray.jl:1217 [inlined]
[13] iterate
@ ./abstractarray.jl:1215 [inlined]
[14] generic_norm2(x::SubArray{Float64, 2, ROCArray{…}, Tuple{…}, false})
@ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:465
[15] norm2
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:529 [inlined]
[16] norm(itr::SubArray{Float64, 2, ROCArray{…}, Tuple{…}, false}, p::Int64)
@ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:598
[17] top-level scope
@ REPL[25]:1
[18] top-level scope
@ ~/.julia/packages/AMDGPU/gtxsf/src/tls.jl:200
Some type information was truncated. Use `show(err)` to see complete types.
julia>
The 2-norm of a CuArray view should behave similarly to other norms, calling CUDA implementation.
Version info
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 5800X 8-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
and
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬──────────────────────────────────────────────────────────────────────────────
│ Available │ Name │ Version │ Path ⋯
├───────────┼──────────────────┼───────────┼──────────────────────────────────────────────────────────────────────────────
│ + │ LLD │ - │ /opt/rocm/llvm/bin/ld.lld ⋯
│ + │ Device Libraries │ - │ /home/lraess/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdg ⋯
│ + │ HIP │ 6.0.32830 │ /opt/rocm-6.0.0/lib/libamdhip64.so ⋯
│ + │ rocBLAS │ 4.0.0 │ /opt/rocm-6.0.0/lib/librocblas.so ⋯
│ + │ rocSOLVER │ 3.24.0 │ /opt/rocm-6.0.0/lib/librocsolver.so ⋯
│ + │ rocALUTION │ - │ /opt/rocm-6.0.0/lib/librocalution.so ⋯
│ + │ rocSPARSE │ - │ /opt/rocm-6.0.0/lib/librocsparse.so ⋯
│ + │ rocRAND │ 2.10.5 │ /opt/rocm-6.0.0/lib/librocrand.so ⋯
│ + │ rocFFT │ 1.0.21 │ /opt/rocm-6.0.0/lib/librocfft.so ⋯
│ + │ MIOpen │ 3.0.0 │ /opt/rocm-6.0.0/lib/libMIOpen.so ⋯
└───────────┴──────────────────┴───────────┴──────────────────────────────────────────────────────────────────────────────
1 column omitted
[ Info: AMDGPU devices
┌────┬───────────────────────┬──────────┬───────────┬────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼───────────────────────┼──────────┼───────────┼────────────┤
│ 1 │ AMD Radeon RX 7800 XT │ gfx1101 │ 32 │ 15.984 GiB │
└────┴───────────────────────┴──────────┴───────────┴────────────┘