StrideArrays.jl
StrideArrays.jl copied to clipboard
StackOverflowError in broadcasting
Reported by @efaulhaber in https://github.com/JuliaSIMD/StrideArrays.jl/pull/60#issuecomment-1334909937
using StrideArrays: PtrArray using OrdinaryDiffEq tspan = (0.0, 0.1) u0_ode = [0.0] ode = ODEProblem((du_ode, u_ode, semi, t) -> nothing, u0_ode, tspan) sol = solve(ode, RDPK3SpFSAL49(thread=OrdinaryDiffEq.True()));
ERROR: StackOverflowError: Stacktrace: [1] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185 [2] macro expansion @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined] [3] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [4] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664 [5] _materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:178 --- the last 5 lines are repeated 19994 more times --- [99976] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}) @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185 [99977] macro expansion @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined] [99978] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [99979] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}) @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
Reduced:
julia> using StrideArrays
julia> foo(x, f) = f(x)
foo (generic function with 1 method)
julia> src1 = rand(10); dst = zero(src1);
julia> src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);
julia> @. dst = foo(src1, abs)
10-element Vector{Float64}:
0.4497752365375052
0.234212713779973
0.8718344166425321
0.1169076748948169
0.12774646887625019
0.41850986610044205
0.017042548453313433
0.9246865917682306
0.4249229606273417
0.7560184865926094
julia> @. dst_ptr = foo(src1_ptr, abs)
ERROR: StackOverflowError:
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
[2] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
[3] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
[4] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178
[5] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:185
--- the last 5 lines are repeated 19994 more times ---
[99976] macro expansion
@ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
[99977] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
[99978] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
[99979] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
@ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178
As far as I understand, StrideArrays.jl prepares everything and hands it over to LoopVectorization.jl. In
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/943e35ddb6bb30c4777efc225b589d33213c95ac/src/broadcast.jl#L528-L567
LV spits out a Broadcast.materialize!
call again, which is the one overloaded in StrideArrays.jl we started with. I have not enough understanding of the complete code to see where to best interrupt this cycle.
Is there a temporary workaround for this?
I am not sure but I think this used to work with some version of StrideArrays/StrideArraysCore/LoopVectorization. Do you have the bandwidth to bisect the versions of these packages to find the problematic change, @efaulhaber?
I can try and see how far I come before I run out of bandwidth.
Okay, here is what I found: Your reduced example works fine with StrideArrays v0.1.19, StrideArraysCore v0.3.17, LoopVectorization v0.12.128. When I leave the other packages at these versions and update LoopVectorization to v0.12.129, I get this error:
ERROR: BoundsError: attempt to access Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64} at index [8]
Stacktrace:
[1] indexed_iterate(t::Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64}, i::Int64, state::Int64)
@ Base .\tuple.jl:88
[2] #s191#70
@ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:561 [inlined]
[3] var"#s191#70"(CNFARG::Any, W::Any, RS::Any, AR::Any, CLS::Any, NT::Any, ::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ LoopVectorization .\none:0
[4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core .\boot.jl:582
[5] avx_config_val
@ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:568 [inlined]
[6] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
@ StrideArrays C:\Users\Erik\.julia\packages\StrideArrays\v8RT3\src\broadcast.jl:183
[7] top-level scope
@ c:\Users\Erik\Documents\Test\test.jl:11
This doesn't change when I update LoopVectorization to the latest version while keeping StrideArrays at v0.1.19.
With StrideArrays v0.1.20, I then get the StackOverflowError
as before.
So it was the change https://github.com/JuliaSIMD/LoopVectorization.jl/compare/v0.12.128...v0.12.129
This introduced the additional check LoopVectorization.can_turbo
before the turbo version. This check fails for foo
in the minimal example. Thus, the fallback Base.Broadcast.materialize!(dest, bc)
is called, resulting in the StackOverflowError
. The old version of LoopVectorization.jl didn't check LoopVectorization.can_turbo
and thus didn't use the Base
fallback.
The MWE above works again if we set LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true
, e.g.,
julia> begin
using StrideArrays
foo(x, f) = f(x)
StrideArrays.LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true
src1 = rand(10); dst = zero(src1);
src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);
@. dst = foo(src1, abs)
@. dst_ptr = foo(src1_ptr, abs)
end
10-element PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}:
0.21757498850961488
0.08772365981125829
0.9338368308540324
0.47094465360632565
0.6707104502974711
0.20925363260701946
0.628131138456404
0.013161734404078751
0.5928102226448883
0.6575575585083866
@chriselrod What is the recommended way to get LoopVectorization.can_turbo(f, ::Val{N}) = true
for something defined in a package? In this case, it is basically
https://github.com/SciML/DiffEqBase.jl/blob/c5154b8be87531a345591db90d3a76b0e8c4d738/src/calculate_residuals.jl#L94-L99
and its related functions. Note that DiffEqBase.jl does not depend on StrideArrays.jl or LoopVectorization.jl directly but uses @..
from FastBroadcast.jl with multi-threading from Polyester.jl.
Does that mean that we will always run into a stack overflow whenever we do something like this without setting can_turbo
to true
first? That seems unintended.
Does that mean that we will always run into a stack overflow whenever we do something like this without setting
can_turbo
totrue
first? That seems unintended.
Yes
Sorry for dragging my feet on this. I'll take a look in a couple hours.
I've implemented a hotfix in #62 by disabling the change (can_turbo
) that caused the regression in the first place.
I do think can_turbo
is useful with the way LV currently works, and would be nice to get working again if I want to try and get more use out of StrideArrays -- which I do, because 1.9 brings us both optional dependencies and substantial reductions in compile times for LV dependent packages.
If someone wants to re-enable or improve can_turbo
for StrideArrays
broadcasts, I don't think it'd be that difficult.
It would mostly involve a fair bit of plumbing.
See the relevant code here:
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/35f83103c12992ddd887cd709bf65e345db5ec9e/src/condense_loopset.jl#L924-L965
can_turbo
tries to guess if a function supports @turbo
via checking whether it returns Union{}
.
julia> using LoopVectorization, VectorizationBase
julia> Base.promote_op(+, Vec{2,Int}, Vec{2,Int}) !== Union{}
true
julia> foo(x, f) = f(x)
foo (generic function with 1 method)
julia> Base.promote_op(foo, Vec{2,Int}, Vec{2,Int}) !== Union{}
false
julia> Base.promote_op(foo, Vec{2,Int}, typeof(abs)) !== Union{}
true
Obviously, calling foo(::Vec{2,Int}, ::Vec{2,Int})
isn't valid!
But that's a pretty naive guess, to put it mildly.
I think there's at least two approaches that would work.
- Preprocess code, defining anonymous functions that capture constants to essentially remove them. The reason this approach is difficult is because
@turbo
works just fine inside@generated
functions currently, so we'd need something like RuntimeGeneratedFunctions to maintain that property. And we'd need to make sure LV doesn't lose information; it knows whata + b
is, so ifb
happens to be a constant, we need to not forget that. I think this approach will get complicated, so I'd suggest instead: - Define a
can_turbo
that takes types as arguments, and feed it actually plausible estimates of what the types are. I think this would be relatively simple to do; you have a
julia> ls.operations # ls is a LoopSet object
4-element Vector{LoopVectorization.Operation}:
var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
var"######arg###11######13###" = 0
destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")
dest[var"###n###1###"] = destination
julia> ls.operations[3]
destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")
julia> ls.operations[3]. # tab to look at completion candidates
children dependencies elementbytes identifier instruction
mangledvariable node_type parents reduced_children reduced_deps
ref rejectcurly rejectinterleave u₁unrolled u₂unrolled
variable vectorized
julia> ls.operations[3].parents
2-element Vector{LoopVectorization.Operation}:
var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
var"######arg###11######13###" = 0
julia> ls.operations[3].parents[2]
var"######arg###11######13###" = 0
The three easiest ways to get a LoopSet
object are
- using
LoopVectorization.@turbo_debug
instead of@turbo
. This will return theLoopSet
object that is updated with the types of all objects involved. -
LoopVectorization.loopset(q)
on a loop expressionq
. This will be an untypedLoopSet
; internally this is used for building a_turbo_!
call which then assembles theLoopSet
you get in1.
. - Define
_a = Ref{Any}()
in a REPL, dev LoopVectorization, and edit code somewhere to addMain._a[] = ls
to store theLoopSet
from that point in time. Beware the Revise probably wont trigger automatically, because_turbo_!
andvmaterialize!
are@generated
functions that aren't going to be invalidated. You'll need to invalidate them manually, which you can do via adding/removing nonsense lines like1 + 2
(which you can see are already in the codebase in these functions for convenience).
Anyway, you should be able to just iterate over the arguments to a function call, and then decide what to do. A simple approach would be:
- If the argument is a load, call
vectype(loaded_from_array)
, wherevectype
's definition is something likevectype(::AbstractArray{T}) where {T} = Vec{2,T}
. - If the argument is an operation, just go with
Vec{2,Int}
as is currently done. Or define this get-type function recursively, and do promotion. But you can save getting fancy for when basics work. - If it is a constant, use the type of the constant, i.e. pass
typeof(sym)
.
"3." will be what fixes the problem here, as you can see above where foo(Vec{2,Int}, typeof(abs))
returns true
.
Thanks a lot for the hotfix, @chriselrod! I can confirm that it fixes the original issue reported in https://github.com/JuliaSIMD/StrideArrays.jl/pull/60#issuecomment-1334909937
Feel free to close this - or leave it open as a reference to the better fix described above