StrideArrays.jl
StrideArrays.jl copied to clipboard
StackOverflowError in broadcasting
Reported by @efaulhaber in https://github.com/JuliaSIMD/StrideArrays.jl/pull/60#issuecomment-1334909937
using StrideArrays: PtrArray using OrdinaryDiffEq tspan = (0.0, 0.1) u0_ode = [0.0] ode = ODEProblem((du_ode, u_ode, semi, t) -> nothing, u0_ode, tspan) sol = solve(ode, RDPK3SpFSAL49(thread=OrdinaryDiffEq.True()));ERROR: StackOverflowError: Stacktrace: [1] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185 [2] macro expansion @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined] [3] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [4] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664 [5] _materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:178 --- the last 5 lines are repeated 19994 more times --- [99976] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185 [99977] macro expansion @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined] [99978] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [99979] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
Reduced:
julia> using StrideArrays
julia> foo(x, f) = f(x)
foo (generic function with 1 method)
julia> src1 = rand(10); dst = zero(src1);
julia> src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);
julia> @. dst = foo(src1, abs)
10-element Vector{Float64}:
0.4497752365375052
0.234212713779973
0.8718344166425321
0.1169076748948169
0.12774646887625019
0.41850986610044205
0.017042548453313433
0.9246865917682306
0.4249229606273417
0.7560184865926094
julia> @. dst_ptr = foo(src1_ptr, abs)
ERROR: StackOverflowError:
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
[2] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
[3] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
[4] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178
[5] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:185
--- the last 5 lines are repeated 19994 more times ---
[99976] macro expansion
@ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
[99977] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
[99978] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
[99979] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178
As far as I understand, StrideArrays.jl prepares everything and hands it over to LoopVectorization.jl. In
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/943e35ddb6bb30c4777efc225b589d33213c95ac/src/broadcast.jl#L528-L567
LV spits out a Broadcast.materialize! call again, which is the one overloaded in StrideArrays.jl we started with. I have not enough understanding of the complete code to see where to best interrupt this cycle.
Is there a temporary workaround for this?
I am not sure but I think this used to work with some version of StrideArrays/StrideArraysCore/LoopVectorization. Do you have the bandwidth to bisect the versions of these packages to find the problematic change, @efaulhaber?
I can try and see how far I come before I run out of bandwidth.
Okay, here is what I found: Your reduced example works fine with StrideArrays v0.1.19, StrideArraysCore v0.3.17, LoopVectorization v0.12.128. When I leave the other packages at these versions and update LoopVectorization to v0.12.129, I get this error:
ERROR: BoundsError: attempt to access Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64} at index [8]
Stacktrace:
[1] indexed_iterate(t::Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64}, i::Int64, state::Int64)
@ Base .\tuple.jl:88
[2] #s191#70
@ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:561 [inlined]
[3] var"#s191#70"(CNFARG::Any, W::Any, RS::Any, AR::Any, CLS::Any, NT::Any, ::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ LoopVectorization .\none:0
[4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:582
[5] avx_config_val
@ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:568 [inlined]
[6] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
@ StrideArrays C:\Users\Erik\.julia\packages\StrideArrays\v8RT3\src\broadcast.jl:183
[7] top-level scope
@ c:\Users\Erik\Documents\Test\test.jl:11
This doesn't change when I update LoopVectorization to the latest version while keeping StrideArrays at v0.1.19.
With StrideArrays v0.1.20, I then get the StackOverflowError as before.
So it was the change https://github.com/JuliaSIMD/LoopVectorization.jl/compare/v0.12.128...v0.12.129
This introduced the additional check LoopVectorization.can_turbo before the turbo version. This check fails for foo in the minimal example. Thus, the fallback Base.Broadcast.materialize!(dest, bc) is called, resulting in the StackOverflowError. The old version of LoopVectorization.jl didn't check LoopVectorization.can_turbo and thus didn't use the Base fallback.
The MWE above works again if we set LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true, e.g.,
julia> begin
using StrideArrays
foo(x, f) = f(x)
StrideArrays.LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true
src1 = rand(10); dst = zero(src1);
src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);
@. dst = foo(src1, abs)
@. dst_ptr = foo(src1_ptr, abs)
end
10-element PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}:
0.21757498850961488
0.08772365981125829
0.9338368308540324
0.47094465360632565
0.6707104502974711
0.20925363260701946
0.628131138456404
0.013161734404078751
0.5928102226448883
0.6575575585083866
@chriselrod What is the recommended way to get LoopVectorization.can_turbo(f, ::Val{N}) = true for something defined in a package? In this case, it is basically
https://github.com/SciML/DiffEqBase.jl/blob/c5154b8be87531a345591db90d3a76b0e8c4d738/src/calculate_residuals.jl#L94-L99
and its related functions. Note that DiffEqBase.jl does not depend on StrideArrays.jl or LoopVectorization.jl directly but uses @.. from FastBroadcast.jl with multi-threading from Polyester.jl.
Does that mean that we will always run into a stack overflow whenever we do something like this without setting can_turbo to true first? That seems unintended.
Does that mean that we will always run into a stack overflow whenever we do something like this without setting
can_turbototruefirst? That seems unintended.
Yes
Sorry for dragging my feet on this. I'll take a look in a couple hours.
I've implemented a hotfix in #62 by disabling the change (can_turbo) that caused the regression in the first place.
I do think can_turbo is useful with the way LV currently works, and would be nice to get working again if I want to try and get more use out of StrideArrays -- which I do, because 1.9 brings us both optional dependencies and substantial reductions in compile times for LV dependent packages.
If someone wants to re-enable or improve can_turbo for StrideArrays broadcasts, I don't think it'd be that difficult.
It would mostly involve a fair bit of plumbing.
See the relevant code here:
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/35f83103c12992ddd887cd709bf65e345db5ec9e/src/condense_loopset.jl#L924-L965
can_turbo tries to guess if a function supports @turbo via checking whether it returns Union{}.
julia> using LoopVectorization, VectorizationBase
julia> Base.promote_op(+, Vec{2,Int}, Vec{2,Int}) !== Union{}
true
julia> foo(x, f) = f(x)
foo (generic function with 1 method)
julia> Base.promote_op(foo, Vec{2,Int}, Vec{2,Int}) !== Union{}
false
julia> Base.promote_op(foo, Vec{2,Int}, typeof(abs)) !== Union{}
true
Obviously, calling foo(::Vec{2,Int}, ::Vec{2,Int}) isn't valid!
But that's a pretty naive guess, to put it mildly.
I think there's at least two approaches that would work.
- Preprocess code, defining anonymous functions that capture constants to essentially remove them. The reason this approach is difficult is because
@turboworks just fine inside@generatedfunctions currently, so we'd need something like RuntimeGeneratedFunctions to maintain that property. And we'd need to make sure LV doesn't lose information; it knows whata + bis, so ifbhappens to be a constant, we need to not forget that. I think this approach will get complicated, so I'd suggest instead: - Define a
can_turbothat takes types as arguments, and feed it actually plausible estimates of what the types are. I think this would be relatively simple to do; you have a
julia> ls.operations # ls is a LoopSet object
4-element Vector{LoopVectorization.Operation}:
var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
var"######arg###11######13###" = 0
destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")
dest[var"###n###1###"] = destination
julia> ls.operations[3]
destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")
julia> ls.operations[3]. # tab to look at completion candidates
children dependencies elementbytes identifier instruction
mangledvariable node_type parents reduced_children reduced_deps
ref rejectcurly rejectinterleave u₁unrolled u₂unrolled
variable vectorized
julia> ls.operations[3].parents
2-element Vector{LoopVectorization.Operation}:
var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
var"######arg###11######13###" = 0
julia> ls.operations[3].parents[2]
var"######arg###11######13###" = 0
The three easiest ways to get a LoopSet object are
- using
LoopVectorization.@turbo_debuginstead of@turbo. This will return theLoopSetobject that is updated with the types of all objects involved. LoopVectorization.loopset(q)on a loop expressionq. This will be an untypedLoopSet; internally this is used for building a_turbo_!call which then assembles theLoopSetyou get in1..- Define
_a = Ref{Any}()in a REPL, dev LoopVectorization, and edit code somewhere to addMain._a[] = lsto store theLoopSetfrom that point in time. Beware the Revise probably wont trigger automatically, because_turbo_!andvmaterialize!are@generatedfunctions that aren't going to be invalidated. You'll need to invalidate them manually, which you can do via adding/removing nonsense lines like1 + 2(which you can see are already in the codebase in these functions for convenience).
Anyway, you should be able to just iterate over the arguments to a function call, and then decide what to do. A simple approach would be:
- If the argument is a load, call
vectype(loaded_from_array), wherevectype's definition is something likevectype(::AbstractArray{T}) where {T} = Vec{2,T}. - If the argument is an operation, just go with
Vec{2,Int}as is currently done. Or define this get-type function recursively, and do promotion. But you can save getting fancy for when basics work. - If it is a constant, use the type of the constant, i.e. pass
typeof(sym).
"3." will be what fixes the problem here, as you can see above where foo(Vec{2,Int}, typeof(abs)) returns true.
Thanks a lot for the hotfix, @chriselrod! I can confirm that it fixes the original issue reported in https://github.com/JuliaSIMD/StrideArrays.jl/pull/60#issuecomment-1334909937
Feel free to close this - or leave it open as a reference to the better fix described above