Tullio.jl icon indicating copy to clipboard operation
Tullio.jl copied to clipboard

HybridArrays

Open prittjam opened this issue 3 years ago • 8 comments

Does Tullio not support HybridArrays?

prittjam avatar Sep 05 '22 15:09 prittjam

What would you like it to do?

It uses axes(A) everywhere, and makes output arrays by calling similar. Thus some things with SOneTo will propagate:

julia> v = HybridArray{Tuple{3}}(ones(3))
3-element HybridVector{3, Float64, 1, Vector{Float64}} with indices SOneTo(3):
 1.0
 1.0
 1.0

julia> @tullio _[i,j] := v[i] * v[j] + (i/j)
3×3 StaticArraysCore.SizedMatrix{3, 3, Float64, 2, Matrix{Float64}} with indices SOneTo(3)×SOneTo(3):
 2.0  1.5  1.33333
 3.0  2.0  1.66667
 4.0  2.5  2.0

mcabbott avatar Sep 05 '22 15:09 mcabbott

This simple example fails but works with einsum, although einsum casts to a dynamic matrix.

d1 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100))
d2 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100))
@tullio d3[i,j] := d1[i,j]*d2[i,j] 

The error is

 MethodError: no method matching asvalbool(::Nothing)
Closest candidates are:
  asvalbool(::T) where T<:Tuple{Vararg{Static.StaticBool}} at ~/.julia/packages/VectorizationBase/gTlqN/src/VectorizationBase.jl:102

Stacktrace:
  [1] val_dense_dims
    @ ~/.julia/packages/VectorizationBase/gTlqN/src/VectorizationBase.jl:110 [inlined]
  [2] densewrapper
    @ ~/.julia/packages/LoopVectorization/nkDac/src/condense_loopset.jl:456 [inlined]
  [3] 𝒜𝒸𝓉!
    @ ~/.julia/packages/Tullio/wAFFh/src/macro.jl:1094 [inlined]
  [4] tile_halves(fun!::var"#𝒜𝒸𝓉!#311", ::Type{Matrix{Float64}}, As::Tuple{Matrix{Float64}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}}, Is::Tuple{UnitRange{Int64}, UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/wAFFh/src/threads.jl:139
  [5] tile_halves(fun!::var"#𝒜𝒸𝓉!#311", ::Type{Matrix{Float64}}, As::Tuple{Matrix{Float64}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}}, Is::Tuple{UnitRange{Int64}, UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/wAFFh/src/threads.jl:142
  [6] tile_halves
    @ ~/.julia/packages/Tullio/wAFFh/src/threads.jl:136 [inlined]
  [7] threader
    @ ~/.julia/packages/Tullio/wAFFh/src/threads.jl:65 [inlined]
  [8] (::var"#ℳ𝒶𝓀ℯ#314"{var"#𝒜𝒸𝓉!#311"})(d1::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, d2::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}})
    @ Main ~/.julia/packages/Tullio/wAFFh/src/macro.jl:807
  [9] (::Tullio.Eval{var"#ℳ𝒶𝓀ℯ#314"{var"#𝒜𝒸𝓉!#311"}, var"#2868#∇ℳ𝒶𝓀ℯ#313"{var"#∇𝒜𝒸𝓉!#312"}})(::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, ::Vararg{HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}})
    @ Tullio ~/.julia/packages/Tullio/wAFFh/src/eval.jl:20
 [10] top-level scope
    @ ~/.julia/packages/Tullio/wAFFh/src/macro.jl:977
 [11] eval
    @ ./boot.jl:373 [inlined]
 [12] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

prittjam avatar Sep 05 '22 16:09 prittjam

OK. This does run for me, although it makes a Matrix, where I'd expect a HybridMatrix. Not sure if this is intended behaviour, seems bit like https://github.com/JuliaArrays/HybridArrays.jl/issues/36

julia> axes(d1)
(SOneTo(3), Base.OneTo(100))

julia> similar(d1, Float32, axes(d1, 1))
3-element HybridVector{3, Float32, 1, Vector{Float32}} with indices SOneTo(3):
 -1.1286683f-36
  3.0f-45
 -1.14295445f-36

julia> similar(d1, Float32, axes(d1))  # here I'd expect a HybridMatrix
3×100 Matrix{Float32}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> @tullio d3[i,j] := d1[i,j]*d2[i,j] 
3×100 Matrix{Float64}:
 0.123891   0.0125692  0.036221    0.358743     …  0.502666   0.0288498  0.123871  0.00600215
 0.0817757  0.0308924  0.00914702  0.0365981       0.446563   0.668597   0.331555  0.00121783
 0.129901   0.574357   0.0113905   0.000131267     0.0395467  0.310613   0.606306  0.288098

Once I load LoopVectorization I get the error above. Unfortunately Tullio is a very good way of finding edge cases in LV, but fortunately, Chris E is very good at fixing them. @tullio d3[i,j] := d1[i,j]*d2[i,j] verbose=2 avx=false grad=false prints out what it's doing, and the Tullio-free reproducer is:

using HybridArrays, StaticArrays, LoopVectorization
d1 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100));
d2 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100));
out = rand(3,100);
function act155!(ℛ::AbstractArray, d1, d2, 𝒶𝓍i=axes(d1,1), 𝒶𝓍j=axes(d1,2))
    @turbo for j = 𝒶𝓍j
        for i = 𝒶𝓍i
            ℛ[i, j] = d1[i, j] * d2[i, j]
        end
    end
    ℛ
end
act155!(out, d1, d2)  # ERROR: MethodError: no method matching asvalbool(::Nothing)

mcabbott avatar Sep 05 '22 16:09 mcabbott

We want HybridMatrix for speed reasons. I was trying Tullio because matrix multiplication also returns a matrix. I'll get rid of LoopVecotrization for now. Seems like there are some corner cases composing these three packages. Thanks for the response!

prittjam avatar Sep 05 '22 16:09 prittjam

Ok. I guess the short answer could have been that Tullio doesn't know about HybridArrays, so it can't do much that is special. Maybe the compiler will figure something out but my guess is that performance won't be different to Arrays. I'm a little surprised that static dimensions don't propagate; you could try allocating the output array for it with the right axes.

At some point I had a branch for StaticArrays, which unrolled everything, and was sometimes pretty fast. But it ran into https://github.com/JuliaLang/julia/issues/37553 and never got merged.

mcabbott avatar Sep 05 '22 16:09 mcabbott

I'm trying to fix the issue but Tullio appears to specifically unwrap d1 in this case:

┌ Info: >>>>> Maker function
│   verbosetidy(ex_make) =
│    quote
│        local @inline(function ℳ𝒶𝓀ℯ(d1, d2)
│                    local 𝒶𝓍j = (axes)(d1, 2)
│                    (axes)(d2, 2) == (axes)(d1, 2) || (throw)("range of index j must agree")
│                    local 𝒶𝓍i = (axes)(d1, 1)
│                    (axes)(d2, 1) == (axes)(d1, 1) || (throw)("range of index i must agree")
│                    @info "left index ranges" i = (axes)(d1, 1) j = (axes)(d1, 2)
│                    local 𝓇𝒽𝓈(d1, d2, i, j) = begin
│                                identity(d1[i, j] * d2[i, j])
│                            end
│                    begin
│                        local 𝒯1 = Core.Compiler.return_type(𝓇𝒽𝓈, (typeof)((d1, d2, (first)(𝒶𝓍i), (first)(𝒶𝓍j))))
│                        local 𝒯2 = if Base.isconcretetype(𝒯1)
│                                    𝒯1
│                                else
│                                    @warn "unable to infer eltype from RHS"
│                                    (typeof)(𝓇𝒽𝓈(d1, d2, (first)(𝒶𝓍i), (first)(𝒶𝓍j)))
│                                end
│                    end
│                    local 𝒯3 = 𝒯2
│                    local 𝒯 = 𝒯3
│                    local d3 = similar(parent(d1), 𝒯, tuple(𝒶𝓍i, 𝒶𝓍j))
│                    begin
│                        (Tullio.threader)(𝒜𝒸𝓉!, (Tullio.storage_type)(d3, d1, d2), d3, tuple(d1, d2), tuple(𝒶𝓍i, 𝒶𝓍j), tuple(), +, 262144, nothing)
│                        d3
│                    end
│                end)
└    end

Is there a way to tell Tullio to not take parent when allocating d3?

mateuszbaran avatar Sep 06 '22 10:09 mateuszbaran

That parent should probably be removed. I have a vague memory that this was working around a bug in NamedDims.jl, at some point.

mcabbott avatar Sep 11 '22 15:09 mcabbott

If you could remove that parent then I wouldn't have to work out how to combine similar for AbstractArray in StaticArrays.jl and HybridArrays.jl with different combinations of axes. I've tried doing that briefly but it would be complicated.

mateuszbaran avatar Sep 12 '22 09:09 mateuszbaran

Closing this as the above example seems fine now.

mcabbott avatar Apr 28 '23 13:04 mcabbott