HybridArrays
Does Tullio not support HybridArrays?
What would you like it to do?
It uses axes(A) everywhere, and makes output arrays by calling similar. Thus some things with SOneTo will propagate:
julia> v = HybridArray{Tuple{3}}(ones(3))
3-element HybridVector{3, Float64, 1, Vector{Float64}} with indices SOneTo(3):
1.0
1.0
1.0
julia> @tullio _[i,j] := v[i] * v[j] + (i/j)
3×3 StaticArraysCore.SizedMatrix{3, 3, Float64, 2, Matrix{Float64}} with indices SOneTo(3)×SOneTo(3):
2.0 1.5 1.33333
3.0 2.0 1.66667
4.0 2.5 2.0
This simple example fails but works with einsum, although einsum casts to a dynamic matrix.
d1 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100))
d2 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100))
@tullio d3[i,j] := d1[i,j]*d2[i,j]
The error is
MethodError: no method matching asvalbool(::Nothing)
Closest candidates are:
asvalbool(::T) where T<:Tuple{Vararg{Static.StaticBool}} at ~/.julia/packages/VectorizationBase/gTlqN/src/VectorizationBase.jl:102
Stacktrace:
[1] val_dense_dims
@ ~/.julia/packages/VectorizationBase/gTlqN/src/VectorizationBase.jl:110 [inlined]
[2] densewrapper
@ ~/.julia/packages/LoopVectorization/nkDac/src/condense_loopset.jl:456 [inlined]
[3] 𝒜𝒸𝓉!
@ ~/.julia/packages/Tullio/wAFFh/src/macro.jl:1094 [inlined]
[4] tile_halves(fun!::var"#𝒜𝒸𝓉!#311", ::Type{Matrix{Float64}}, As::Tuple{Matrix{Float64}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}}, Is::Tuple{UnitRange{Int64}, UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
@ Tullio ~/.julia/packages/Tullio/wAFFh/src/threads.jl:139
[5] tile_halves(fun!::var"#𝒜𝒸𝓉!#311", ::Type{Matrix{Float64}}, As::Tuple{Matrix{Float64}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}}, Is::Tuple{UnitRange{Int64}, UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
@ Tullio ~/.julia/packages/Tullio/wAFFh/src/threads.jl:142
[6] tile_halves
@ ~/.julia/packages/Tullio/wAFFh/src/threads.jl:136 [inlined]
[7] threader
@ ~/.julia/packages/Tullio/wAFFh/src/threads.jl:65 [inlined]
[8] (::var"#ℳ𝒶𝓀ℯ#314"{var"#𝒜𝒸𝓉!#311"})(d1::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, d2::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}})
@ Main ~/.julia/packages/Tullio/wAFFh/src/macro.jl:807
[9] (::Tullio.Eval{var"#ℳ𝒶𝓀ℯ#314"{var"#𝒜𝒸𝓉!#311"}, var"#2868#∇ℳ𝒶𝓀ℯ#313"{var"#∇𝒜𝒸𝓉!#312"}})(::HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}, ::Vararg{HybridMatrix{3, StaticArrays.Dynamic(), Float64, 2, Matrix{Float64}}})
@ Tullio ~/.julia/packages/Tullio/wAFFh/src/eval.jl:20
[10] top-level scope
@ ~/.julia/packages/Tullio/wAFFh/src/macro.jl:977
[11] eval
@ ./boot.jl:373 [inlined]
[12] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
OK. This does run for me, although it makes a Matrix, where I'd expect a HybridMatrix. Not sure if this is intended behaviour, seems bit like https://github.com/JuliaArrays/HybridArrays.jl/issues/36
julia> axes(d1)
(SOneTo(3), Base.OneTo(100))
julia> similar(d1, Float32, axes(d1, 1))
3-element HybridVector{3, Float32, 1, Vector{Float32}} with indices SOneTo(3):
-1.1286683f-36
3.0f-45
-1.14295445f-36
julia> similar(d1, Float32, axes(d1)) # here I'd expect a HybridMatrix
3×100 Matrix{Float32}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> @tullio d3[i,j] := d1[i,j]*d2[i,j]
3×100 Matrix{Float64}:
0.123891 0.0125692 0.036221 0.358743 … 0.502666 0.0288498 0.123871 0.00600215
0.0817757 0.0308924 0.00914702 0.0365981 0.446563 0.668597 0.331555 0.00121783
0.129901 0.574357 0.0113905 0.000131267 0.0395467 0.310613 0.606306 0.288098
Once I load LoopVectorization I get the error above. Unfortunately Tullio is a very good way of finding edge cases in LV, but fortunately, Chris E is very good at fixing them. @tullio d3[i,j] := d1[i,j]*d2[i,j] verbose=2 avx=false grad=false prints out what it's doing, and the Tullio-free reproducer is:
using HybridArrays, StaticArrays, LoopVectorization
d1 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100));
d2 = HybridMatrix{3,StaticArrays.Dynamic()}(rand(3,100));
out = rand(3,100);
function act155!(ℛ::AbstractArray, d1, d2, 𝒶𝓍i=axes(d1,1), 𝒶𝓍j=axes(d1,2))
@turbo for j = 𝒶𝓍j
for i = 𝒶𝓍i
ℛ[i, j] = d1[i, j] * d2[i, j]
end
end
ℛ
end
act155!(out, d1, d2) # ERROR: MethodError: no method matching asvalbool(::Nothing)
We want HybridMatrix for speed reasons. I was trying Tullio because matrix multiplication also returns a matrix. I'll get rid of LoopVecotrization for now. Seems like there are some corner cases composing these three packages. Thanks for the response!
Ok. I guess the short answer could have been that Tullio doesn't know about HybridArrays, so it can't do much that is special. Maybe the compiler will figure something out but my guess is that performance won't be different to Arrays. I'm a little surprised that static dimensions don't propagate; you could try allocating the output array for it with the right axes.
At some point I had a branch for StaticArrays, which unrolled everything, and was sometimes pretty fast. But it ran into https://github.com/JuliaLang/julia/issues/37553 and never got merged.
I'm trying to fix the issue but Tullio appears to specifically unwrap d1 in this case:
┌ Info: >>>>> Maker function
│ verbosetidy(ex_make) =
│ quote
│ local @inline(function ℳ𝒶𝓀ℯ(d1, d2)
│ local 𝒶𝓍j = (axes)(d1, 2)
│ (axes)(d2, 2) == (axes)(d1, 2) || (throw)("range of index j must agree")
│ local 𝒶𝓍i = (axes)(d1, 1)
│ (axes)(d2, 1) == (axes)(d1, 1) || (throw)("range of index i must agree")
│ @info "left index ranges" i = (axes)(d1, 1) j = (axes)(d1, 2)
│ local 𝓇𝒽𝓈(d1, d2, i, j) = begin
│ identity(d1[i, j] * d2[i, j])
│ end
│ begin
│ local 𝒯1 = Core.Compiler.return_type(𝓇𝒽𝓈, (typeof)((d1, d2, (first)(𝒶𝓍i), (first)(𝒶𝓍j))))
│ local 𝒯2 = if Base.isconcretetype(𝒯1)
│ 𝒯1
│ else
│ @warn "unable to infer eltype from RHS"
│ (typeof)(𝓇𝒽𝓈(d1, d2, (first)(𝒶𝓍i), (first)(𝒶𝓍j)))
│ end
│ end
│ local 𝒯3 = 𝒯2
│ local 𝒯 = 𝒯3
│ local d3 = similar(parent(d1), 𝒯, tuple(𝒶𝓍i, 𝒶𝓍j))
│ begin
│ (Tullio.threader)(𝒜𝒸𝓉!, (Tullio.storage_type)(d3, d1, d2), d3, tuple(d1, d2), tuple(𝒶𝓍i, 𝒶𝓍j), tuple(), +, 262144, nothing)
│ d3
│ end
│ end)
└ end
Is there a way to tell Tullio to not take parent when allocating d3?
That parent should probably be removed. I have a vague memory that this was working around a bug in NamedDims.jl, at some point.
If you could remove that parent then I wouldn't have to work out how to combine similar for AbstractArray in StaticArrays.jl and HybridArrays.jl with different combinations of axes. I've tried doing that briefly but it would be complicated.
Closing this as the above example seems fine now.