Tullio.jl icon indicating copy to clipboard operation
Tullio.jl copied to clipboard

Scalar indexing error when using Metal GPU backend

Open alonsoC1s opened this issue 2 months ago β€’ 3 comments

I've been porting a simulation to use the GPU. There is just one piece that I'm missing to keep everyting on the GPU side, a reduction using Tullio

# For dimensions >= 2
@inline function reduce_forces!(Fij, Dijd, Pij, ::Val{D}) where {D}
    @tullio Fij[d, i] = Dijd[i, j, d] * Pij[i, j]
    return nothing
end

My local machine can't use CUDA, so I'm testing with Metal.jl and I haven't been able to get Tullio to work with Metal arrays. Here is a MWE

using Tullio, KernelAbstractions, Metal
mul(A, B, C) = @tullio C[k] = A[k] * B[k];  # run macro after loading packages

n = 2^9
A = rand(n, n)
B = rand(n, n)
C = similar(A)

# Works on CPU
mul(A, B, C)

# Throws "scalar indexing" error on Metal backend, not on CUDA though
mul(mtl(A), mtl(B), mtl(C))

Here's the full error:

julia> mul(mtl(A), mtl(B), mtl(C))
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
  [5] getindex
    @ ~/.julia/packages/GPUArrays/ZRk7Q/src/host/indexing.jl:50 [inlined]
  [6] π’œπ’Έπ“‰!
    @ ~/.julia/packages/Tullio/2zyFP/src/macro.jl:1036 [inlined]
  [7] tile_halves(fun!::var"#π’œπ’Έπ“‰!#1", ::Type{…}, As::Tuple{…}, Is::Tuple{…}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/2zyFP/src/threads.jl:139
  [8] tile_halves(fun!::var"#π’œπ’Έπ“‰!#1", ::Type{…}, As::Tuple{…}, Is::Tuple{…}, Js::Tuple{}, keep::Nothing, final::Bool) (repeats 12 times)
    @ Tullio ~/.julia/packages/Tullio/2zyFP/src/threads.jl:142
  [9] tile_halves
    @ ~/.julia/packages/Tullio/2zyFP/src/threads.jl:136 [inlined]
 [10] threader
    @ ~/.julia/packages/Tullio/2zyFP/src/threads.jl:65 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/Tullio/2zyFP/src/macro.jl:1004 [inlined]
 [12] mul(A::MtlMatrix{…}, B::MtlMatrix{…}, C::MtlMatrix{…})
    @ Main ./REPL[2]:1
 [13] top-level scope
    @ REPL[8]:1
 [14] top-level scope
    @ ~/.julia/packages/Metal/UahNC/src/initialization.jl:80
Some type information was truncated. Use `show(err)` to see complete types.

Thanks in advance for the help, and thanks for the work on the package in general

alonsoC1s avatar Oct 03 '25 12:10 alonsoC1s

I can reproduce this, but don't know why it's failing.

The next thing to check is probably whether the same error occurs with KernelAbstractions alone. Expanding the macro, or reading what it says with verbose=2, it should be possible to reconstruct the functions. This readme section has some explanation of what act! is meant to do.

julia> mul(mtl(A), mtl(B), mtl(C))
ERROR: Scalar indexing is disallowed.
...
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
...
  [6] π’œπ’Έπ“‰!
    @ ~/.julia/packages/Tullio/2zyFP/src/macro.jl:1036 [inlined]
  [7] tile_halves(fun!::var"#π’œπ’Έπ“‰!#5", ::Type{MtlMatrix{Float32, Metal.PrivateStorage}}, As::Tuple{MtlMatrix{Float32, Metal.PrivateStorage}, MtlMatrix{Float32, Metal.PrivateStorage}, MtlMatrix{Float32, Metal.PrivateStorage}}, Is::Tuple{UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/2zyFP/src/threads.jl:139
...
 [12] mul(A::MtlMatrix{…}, B::MtlMatrix{…}, C::MtlMatrix{…})
    @ Main ./REPL[77]:1
 [13] top-level scope
    @ REPL[91]:1
 [14] top-level scope
    @ ~/.julia/packages/Metal/UahNC/src/initialization.jl:80
Some type information was truncated. Use `show(err)` to see complete types.

(jl_SJaIcd) pkg> st
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_SJaIcd/Project.toml`
  [63c18a36] KernelAbstractions v0.9.38
  [dde4c033] Metal v1.8.1
  [bc48ee85] Tullio v0.3.8

julia> @tullio C[k] = A[k] * B[k] verbose=2 grad=false;
β”Œ Info: >>>>> Preliminary expressions
β”‚   verbosetidy(ex_pre) =
β”‚    quote
β””    end
β”Œ Info: ===== Base actor 
β”‚   verbosetidy(ex_act) =
β”‚    quote
β”‚        local @inline(function π’œπ’Έπ“‰!(::Type, β„›::AbstractArray{𝒯}, A, B, 𝒢𝓍k, ♻️ = nothing, πŸ’€ = true) where 𝒯
β”‚                    @inbounds @fastmath(begin
β”‚                                for k = 𝒢𝓍k
β”‚                                    β„›[k] = A[k] * B[k]
β”‚                                end
β”‚                            end)
β”‚                end)
β””    end
β”Œ Info: =====KA===== KernelAbstractions kernel 
β”‚   verbosetidy(kex1) =
β”‚    quote
β”‚        KernelAbstractions.@kernel function var"##πŸ‡¨πŸ‡Ί#248"(β„›::AbstractArray{𝒯}, A, B, 𝒢𝓍k, ♻️, πŸ’€) where 𝒯
β”‚                (k,) = @index(Global, NTuple)
β”‚                begin
β”‚                    β„›[k] = A[k] * B[k]
β”‚                end
β”‚            end
β””    end
[ Info: success expanding KernelAbstractions.@kernel
β”Œ Info: threading threshold (from cost = 1)
β””   block = 262144
β”Œ Info: In-place body
β”‚   verbosetidy(ex_body) =
β”‚    quote
β”‚        local 𝒢𝓍k = (Tullio.linearindex)(C)
β”‚        (Tullio.linearindex)(A) == (Tullio.linearindex)(C) || (throw)("range of index k must agree")
β”‚        (Tullio.linearindex)(B) == (Tullio.linearindex)(C) || (throw)("range of index k must agree")
β”‚        @info "left index ranges" k = (Tullio.linearindex)(C)
β”‚        begin
β”‚            (Tullio.threader)(π’œπ’Έπ“‰!, (Tullio.storage_type)(C, A, B), C, tuple(A, B), tuple(𝒢𝓍k), tuple(), +, 262144, nothing)
β”‚            C
β”‚        end
β””    end

mcabbott avatar Oct 03 '25 21:10 mcabbott

Ah, a possible culprit is this:

https://github.com/mcabbott/Tullio.jl/blob/master/ext/TullioCUDAExt.jl

The CPU path has recursive tiling before calling act!, but the GPU path is meant to skip that. But tile_halves in stacktrace above hints it's not. And perhaps the problem is that this is done only for T<:CUDA.CuArray.

It's possible that there are other such mistakes. In fact it's possible that nothing ever worked with Metal.jl.

Sorry I don't have much time to fix things around here. But a PR making it do for Metal what it does for CUDA might not be too hard.

mcabbott avatar Oct 03 '25 21:10 mcabbott

I'd love to take a closer look, I've been meaning to contribute for a while I'll recreate the extension with MtlArray and see what happens

alonsoC1s avatar Oct 04 '25 08:10 alonsoC1s