StaticArrays.jl
StaticArrays.jl copied to clipboard
Invalidation and load time
On julia nightly
julia> @time using StaticArrays
1.160173 seconds (1.95 M allocations: 142.974 MiB, 34.39% compilation time)
julia> using SnoopCompileCore
julia> invalidations = @snoopr using StaticArrays;
julia> using SnoopCompile
julia> trees = invalidation_trees(invalidations)
13-element Vector{SnoopCompile.MethodInvalidations}:
inserting rdims(out::Val{N}, inds::Tuple{SOneTo, Vararg{SOneTo}}) where N @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/abstractarray.jl:209 invalidated:
mt_backedges: 1: signature Tuple{typeof(Base.rdims), Val{1}, Any} triggered MethodInstance for reshape(::BitArray, ::Val{1}) (0 children)
inserting to_indices(A, I::Tuple{Vararg{Union{Integer, CartesianIndex, StaticArray{<:Tuple, Int64}}}}) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/indexing.jl:218 invalidated:
backedges: 1: superseding to_indices(A, I::Tuple) @ Base indices.jl:344 with MethodInstance for to_indices(::AbstractArray, ::Tuple{Any, CartesianIndex{0}}) (1 children)
36 mt_cache
inserting getproperty(::SOneTo{n}, s::Symbol) where n @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/SOneTo.jl:57 invalidated:
backedges: 1: superseding getproperty(x, f::Symbol) @ Base Base.jl:37 with MethodInstance for getproperty(::AbstractUnitRange, ::Symbol) (1 children)
inserting (::Type{SSC})(a::AbstractVector) where SSC<:SHermitianCompact @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/SHermitianCompact.jl:83 invalidated:
backedges: 1: superseding Union{}(a...) @ Core boot.jl:263 with MethodInstance for Union{}(::BitArray) (1 children)
inserting rdims(out::Tuple{Any}, inds::Tuple{SOneTo, Vararg{SOneTo}}) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/abstractarray.jl:210 invalidated:
backedges: 1: superseding rdims(out::Tuple{Any}, inds::Tuple{Any}) @ Base reshapedarray.jl:156 with MethodInstance for Base.rdims(::Tuple{Base.OneTo{Int64}}, ::Tuple{Any}) (1 children)
2: superseding rdims(out::Tuple{Any}, inds::Tuple{Vararg{Any, M}}) where M @ Base reshapedarray.jl:157 with MethodInstance for Base.rdims(::Tuple{Base.OneTo{Int64}}, ::Tuple) (2 children)
inserting instantiate(B::Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{M}}) where M @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/broadcast.jl:33 invalidated:
backedges: 1: superseding instantiate(bc::Base.Broadcast.Broadcasted{Style}) where Style @ Base.Broadcast broadcast.jl:292 with MethodInstance for Base.Broadcast.instantiate(::Base.Broadcast.Broadcasted{Style, Nothing, typeof(Base.wrap_string)} where Style<:Union{Nothing, Base.Broadcast.BroadcastStyle}) (1 children)
2: superseding instantiate(bc::Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}}) @ Base.Broadcast broadcast.jl:301 with MethodInstance for Base.Broadcast.instantiate(::Base.Broadcast.Broadcasted{Style, Nothing, typeof(Base.wrap_string)} where Style<:Base.Broadcast.AbstractArrayStyle{0}) (4 children)
1 mt_cache
inserting _axes(bc::Base.Broadcast.Broadcasted{<:StaticArrays.StaticArrayStyle}, ::Nothing) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/broadcast.jl:30 invalidated:
backedges: 1: superseding _axes(bc::Base.Broadcast.Broadcasted, ::Nothing) @ Base.Broadcast broadcast.jl:224 with MethodInstance for Base.Broadcast._axes(::Base.Broadcast.Broadcasted, ::Nothing) (1 children)
2: superseding _axes(bc::Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}}, ::Nothing) @ Base.Broadcast broadcast.jl:225 with MethodInstance for Base.Broadcast._axes(::Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}}, ::Nothing) (4 children)
inserting SubArray(A::AbstractArray, indices::Tuple{StaticArrays.StaticIndexing, Vararg{StaticArrays.StaticIndexing}}) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/indexing.jl:379 invalidated:
mt_backedges: 1: signature Tuple{Type{SubArray}, Any, Any} triggered MethodInstance for Base._maybe_reindex(::SubArray, ::Tuple{Base.LogicalIndex{Int64, A} where A<:(BitArray)}, ::Tuple{}) (4 children)
backedges: 1: superseding SubArray(parent::AbstractArray, indices::Tuple) @ Base subarray.jl:26 with MethodInstance for SubArray(::AbstractArray, ::Tuple) (1 children)
inserting similar(::Type{A}, shape::Union{Tuple{SOneTo, Vararg{Union{Integer, Base.OneTo, SOneTo}}}, Tuple{Union{Integer, Base.OneTo}, SOneTo, Vararg{Union{Integer, Base.OneTo, SOneTo}}}, Tuple{Union{Integer, Base.OneTo}, Union{Integer, Base.OneTo}, SOneTo, Vararg{Union{Integer, Base.OneTo, SOneTo}}}}) where A<:AbstractArray @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/abstractarray.jl:178 invalidated:
mt_backedges: 1: signature Tuple{typeof(similar), Type{Array{Union{Int64, Symbol}, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Union{Int64, Symbol}, _A}}, ::Union{Integer, AbstractUnitRange}) where _A (0 children)
2: signature Tuple{typeof(similar), Type{Array{Any, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Any, _A}}, ::Union{Integer, AbstractUnitRange}) where _A (0 children)
3: signature Tuple{typeof(similar), Type{Array{Base.PkgId, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Base.PkgId, _A}}, ::Union{Integer, AbstractUnitRange}) where _A (0 children)
4: signature Tuple{typeof(similar), Type{Array{Union{Int64, Symbol}, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Union{Int64, Symbol}, _A}}, ::Tuple{Union{Integer, Base.OneTo}}) where _A (2 children)
5: signature Tuple{typeof(similar), Type{Array{Any, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Any, _A}}, ::Tuple{Union{Integer, Base.OneTo}}) where _A (2 children)
6: signature Tuple{typeof(similar), Type{Array{Base.PkgId, _A}} where _A, Tuple{Union{Integer, AbstractUnitRange}}} triggered MethodInstance for similar(::Type{Array{Base.PkgId, _A}}, ::Tuple{Union{Integer, Base.OneTo}}) where _A (2 children)
inserting unsafe_view(A::AbstractArray, i1::StaticArrays.StaticIndexing, indices::StaticArrays.StaticIndexing...) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/indexing.jl:370 invalidated:
mt_backedges: 1: signature Tuple{typeof(Base.unsafe_view), Vector{Pair{DataType, Function}}, Vararg{Any}} triggered MethodInstance for view(::Vector{Pair{DataType, Function}}, ::Any) (0 children)
2: signature Tuple{typeof(Base.unsafe_view), Vector{UInt8}, Vararg{Any}} triggered MethodInstance for view(::Vector{UInt8}, ::Any) (0 children)
3: signature Tuple{typeof(Base.unsafe_view), Vector{Base.PkgId}, Vararg{Any}} triggered MethodInstance for view(::Vector{Base.PkgId}, ::Any) (0 children)
4: signature Tuple{typeof(Base.unsafe_view), BitArray, Vararg{Any}} triggered MethodInstance for view(::BitArray, ::BitArray) (9 children)
inserting isassigned(a::StaticArray, i::Int64...) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/abstractarray.jl:31 invalidated:
backedges: 1: superseding isassigned(a::AbstractArray, i::Integer...) @ Base abstractarray.jl:577 with MethodInstance for isassigned(::AbstractMatrix, ::Int64, ::Int64) (4 children)
2: superseding isassigned(a::AbstractArray, i::Integer...) @ Base abstractarray.jl:577 with MethodInstance for isassigned(::AbstractVecOrMat, ::Int64, ::Int64) (12 children)
inserting eachindex(::IndexLinear, a::StaticArray) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/abstractarray.jl:19 invalidated:
backedges: 1: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Union{Missing, Float16}}) (2 children)
2: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Union{Missing, Float32}}) (2 children)
3: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Union{Missing, Float64}}) (2 children)
4: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Missing}) (2 children)
5: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Float16}) (2 children)
6: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Float32}) (2 children)
7: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector{Float64}) (2 children)
8: superseding eachindex(::IndexLinear, A::AbstractVector) @ Base abstractarray.jl:355 with MethodInstance for eachindex(::IndexLinear, ::AbstractVector) (2 children)
inserting getindex(sa::Type{SA}, xs...) @ StaticArrays ~/Dropbox/JuliaPackages/StaticArrays.jl/src/initializers.jl:31 invalidated:
backedges: 1: superseding getindex(::Type{T}, vals...) where T @ Base array.jl:399 with MethodInstance for getindex(::Type, ::Nothing, ::Vararg{String}) (119 children)
I'm not certain how easily these may be addressed, but the SA[1,2,3] constructor, which is the source of the most invalidation, is a bit unfortunate, as it doesn't follow the docstring:
getindex(type[, elements...])
Construct a 1-d array of the specified type. This is usually called with the syntax
`Type[]`. Element values can be specified using `Type[a,b,c,...]`.
I wonder if this may be deprecated, as the macro version @SArray [1,2,3] seems essentially identical?
I've checked this and on the newest master (Version 1.9.0-DEV.1125 (2022-08-13), Commit e1fa6a51e4*) the time is back down to Julia 1.7 levels. Also, restricting that getindex to numbers and arrays (xs::Union{Number,AbstractArray}...), which should cover almost all use cases of SA, removes all of these invalidations -- so that would also be an option.
FWIW, the getindex invalidations are gone on Pkg.jl master after https://github.com/JuliaLang/Pkg.jl/pull/3189.
I revisited this today, trying to pin down where the majority of load time is coming from. Some comments below:
-
It doesn't appear that the remaining invalidations are the main cause of
usingtime. To test this, I commented out every method definition highlighted byinvalidation_trees(invalidations)and ran@time using StaticArraysbefore and after. Whatever change this brings, it is within the noise. -
Next, I commented out various of the
includes in/src/StaticArrays.jl, runningjulia --project -e 'using Pkg; Pkg.precompile(); @time using StaticArrays'for different configurations of
includes. While this cannot account for interactions betweenincludes, it still highlights some interesting things, summarized below:- On my laptop, on v1.9-rc2, it takes ~0.36-0.37 s to load current master of StaticArrays.
- Commenting out
include("precompile.jl"); _precompile(), this drops to ~0.33. I.e., ~0.03 s cost for current precompilation directives.
Next, estimates of savings of commenting out various files (i.e., ~load cost) :
/src/matrix_multiply.jland/src/matrix_multiply_add.jl: 0.11 s./src/abstractarray.jl: 0.07 s/src/triangular.jl: 0.015 s/src/convert.jl: 0.04 s/src/SizedArray.jl: 0.04 s
Roughly, this accounts for about ~80% of the load time without precompilation directives.
The matrix_multiply_*.jl and abstractarray.jl files stand out the most. It might be interesting to try to narrow down where most of the load costs originate from in these files. Both make heavy use of @generated (but on the other hand, so does several other files in StaticArrays.jl that don't contribute as much (e.g., /src/indexing.jl: virtually zero load impact).
Have you @profiled using StaticArrays? Then use C=true when visualizing the flamegraph. My strong suspicion is that it's method-insertion which takes all the time: there are a lot of extensions to Base and LinearAlgebra functions.
The big cost of method insertion is to check for invalidations. It's not just whether you cause invalidations, because we still have to check every potentially-invalidating method (which is any method that you add to an externally-owned function).
We had a discussion about this on Slack and I thought I'd copy it over here to not lose it to the Slack-limit: https://gist.github.com/thchr/e54449e88ec72c24e80217f33ab16e8b
For me, one of the big take-aways was that a very big chunk of load latency comes from method definitions in StaticArrays that use big union types. Specifically, just reducing the four big unions, StaticVecOrMatLikeForFiveArgMulDest, StaticMatMulLike, SimilarTypeArrayWrapper, and StaticMatrixLike to the first types in each union reduces the load time of StaticArrays by ~60%.
It's not clear to me how this could translate to concrete things we could do to reduce the load time of StaticArrays today, but it does indicate where a big part of the problem lies - and also indicates that it may not be too worthwhile to search for the problem elsewhere (e.g., in invalidations).
At the bottom of the thread vtjnash included various general advice for things not to do. Aside from avoiding big unions (which seems occasionally unavoidable), one potentially relevant thing for StaticArrays.jl could be his point 5:
- Don’t use
@generatedover more lines of code than essential. It is required that this style of function definition restricts the quality of inference, since it generally suffers from incorrect tfunc generalization, leading to excess duplicate work (see 3). Try to move most work into helper functions, outside of the core logic kernel (either inside or outside of it).
What is the status of this? StaticArrays.jl seems to be an anomaly in terms of having a long load time; here's my package's sorted @time_imports below. It is making it hard for me to cut my own package's load time down.
400.6 ms StaticArrays
320.1 ms Zygote 1.80% compilation time
184.2 ms VectorizationBase
136.4 ms DynamicExpressions
112.4 ms LoopVectorization
102.8 ms CategoricalArrays
77.2 ms ForwardDiff
75.7 ms FillArrays
51.7 ms OffsetArrays
39.5 ms JSON3
32.2 ms Tables
28.0 ms StructArrays
26.6 ms ChainRulesCore
26.4 ms Static
21.2 ms Optim
20.9 ms StatsBase
12.7 ms ChainRules
12.3 ms Missings
12.1 ms IRTools
10.1 ms NLSolversBase
9.3 ms MacroTools
8.8 ms HostCPUFeatures
8.6 ms SpecialFunctions
8.4 ms LineSearches
7.7 ms ThreadingUtilities
7.3 ms LossFunctions
6.9 ms StructTypes
6.9 ms PolyesterWeave
6.5 ms IrrationalConstants
5.5 ms StaticArrayInterface
5.4 ms FiniteDiff
5.2 ms AbstractFFTs
5.1 ms CloseOpenIntervals
4.6 ms StaticArraysCore
4.4 ms LayoutPointers
4.2 ms LoopVectorization → ForwardDiffExt
4.1 ms DiffResults
3.8 ms SLEEFPirates
3.7 ms ArrayInterfaceCore
3.3 ms ArrayInterface
3.1 ms PositiveFactorizations
3.0 ms StaticArrayInterface → StaticArrayInterfaceStaticArraysExt
2.9 ms GPUArraysCore
2.5 ms OpenSpecFun_jll
2.4 ms ManualMemory
2.2 ms DocStringExtensions
2.1 ms CategoricalArrays → CategoricalArraysStructTypesExt
2.1 ms CPUSummary
2.0 ms ProgressBars
1.9 ms SpecialFunctions → SpecialFunctionsChainRulesCoreExt
1.9 ms SortingAlgorithms
1.9 ms Parameters
1.9 ms DataAPI
1.8 ms ZygoteRules
1.8 ms LogExpFunctions → LogExpFunctionsChainRulesCoreExt
1.7 ms StatsAPI
1.7 ms StaticArrayInterface → StaticArrayInterfaceOffsetArraysExt
1.7 ms RealDot
1.7 ms ForwardDiff → ForwardDiffStaticArraysExt
1.7 ms CategoricalArrays → CategoricalArraysJSONExt
1.6 ms Zygote → ZygoteColorsExt
1.6 ms DiffRules
1.6 ms AbstractFFTs → AbstractFFTsChainRulesCoreExt
1.5 ms ArrayInterface → ArrayInterfaceGPUArraysCoreExt
1.5 ms Adapt → AdaptStaticArraysExt
1.4 ms CommonSubexpressions
1.4 ms ArrayInterface → ArrayInterfaceStaticArraysCoreExt
1.3 ms TableTraits
1.3 ms LoopVectorization → SpecialFunctionsExt
1.3 ms LogExpFunctions
1.3 ms IteratorInterfaceExtensions
1.3 ms DataValueInterfaces
1.2 ms UnPack
1.2 ms SIMDTypes
1.1 ms SuiteSparse
1.1 ms IfElse
1.0 ms BitTwiddlingConvenienceFunctions
1.0 ms Adapt
It's not just that StaticArrays has a high load time by itself, it may make load times of other packages worse. On v1.9, in fresh sessions:
julia> @time using LowRankApprox
0.532947 seconds (705.99 k allocations: 46.672 MiB, 2.66% gc time, 0.81% compilation time)
julia> @time using StaticArrays
0.498043 seconds (860.22 k allocations: 58.306 MiB, 3.12% gc time, 1.17% compilation time)
julia> @time using LowRankApprox
1.249681 seconds (6.20 M allocations: 277.225 MiB, 3.99% gc time, 0.35% compilation time)
This is improved significantly on v1.10.0-alpha1:
julia> @time using StaticArrays
0.293981 seconds (252.14 k allocations: 11.858 MiB, 2.05% compilation time)
julia> @time using LowRankApprox
0.751403 seconds (3.99 M allocations: 168.389 MiB, 4.78% gc time, 0.62% compilation time)
however, the load time of LowRankApprox is much lower as well:
julia> @time using LowRankApprox
0.293936 seconds (195.40 k allocations: 10.274 MiB, 1.61% compilation time)
I am not sure if this the appropriate place, but we've noticed for Oscar.jl that StaticArrays (which is only an indirect dependency) introduced over 5000 invalidations -- this comes from its Base.cconvert(P::Type{Ptr{T}}, ...) methods, which means that most ccall in our code or in any of the other dependencies is affected.
I am not sure this is the "fault" of StaticArrays though, at least on a quick glance it seems to be providing cconvert and unsafe_convert methods as intended, so this may be more a general issue with Julia?
Yes, probably worth reporting that to Julia
StaticArrays is still quite slow to load, and is the bottleneck for me from some downstream dependencies. Are there any plans to address this @thchr or @mateuszbaran? It has unfortunately made things a bit difficult to improve load time of user-facing libraries where StaticArrays.jl is a dependency-of-a-dependency.
Maybe StaticArrays could be split up into smaller component libraries handling different tasks, and StaticArrays could just re-export them? It seems like often libraries will just use a small subset of all of the features available. (I don't mean an interface like StaticArraysCore, but rather the methods themselves could be moved into component libraries). These could even be stored within the StaticArrays.jl repo in subfolders.
For example, sometimes I've wanted to just use MArray, SArray, or SizedArray, but not all three. Maybe they could go into component libraries?
Other things to explore optimizing include the many @generated functions throughout the codebase: https://github.com/search?q=repo:JuliaArrays/StaticArrays.jl%20/@generated/&type=code.
I can review and merge PRs that improve load times but I don't have time or motivation to work on it myself. One major constraint is that I won't merge anything breaking so it would have to wait for another maintainer. IIRC there is not much that can be done without either breaking changes or runtime performance degradation -- all low hanging fruits have been picked.
Making smaller component libraries out of StaticArrays.jl could be non-breaking but note that this makes maintenance harder, so someone would have to show that it's worth the trouble.