StaticArrays.jl
StaticArrays.jl copied to clipboard
Info request: performance of == vs map(==)
Wondering if the following performance is expected?
# _
# _ _ _(_)_ | Documentation: https://docs.julialang.org
# (_) | (_) (_) |
# _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
# | | | | | | |/ _` | |
# | | |_| | | | (_| | | Version 1.6.1 (2021-04-23)
# _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
#|__/ |
#
julia> using StaticArrays
julia> @time SVector(1:100...) == SVector(1:100...)
0.347808 seconds (749.39 k allocations: 48.644 MiB, 15.53% gc time, 99.78% compilation time)
true
julia> @time map(==, [SVector(1:100...)], [SVector(1:100...)]);
12.972637 seconds (409.54 k allocations: 24.336 MiB, 100.00% compilation time)
Assuming this is expected, wonder if anyone would be willing to shed light on what's happening with map to cause this?
Note:
100.00% compilation time
Yep, good eye! I don't understand what additional function needs to get compiled, though. I would think since map(==, [a], [b]) is (at least conceptually :stuck_out_tongue_winking_eye: ) equal to [a == b], there would not be this discrepancy.
You can try SnoopCompile to see where the difference comes from:
using SnoopCompile
SnoopCompile.@snoopc "/tmp/compiles_a.log" begin
using StaticArrays
map(==, [SVector(1:100...)], [SVector(1:100...)]);
end
SnoopCompile.@snoopc "/tmp/compiles_b.log" begin
using StaticArrays
SVector(1:100...) == SVector(1:100...)
end
Use @snoopi_deeprather than @snoopc.
Tried @snoopi_deep for the map-variant:
julia> using StaticArrays, SnoopCompile
julia> v = [SVector(1:100...)]
1-element Vector{SVector{100, Int64}}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
julia> tinf = @snoopi_deep map(==, v, v);
julia> flatten(tinf)
344-element Vector{SnoopCompileCore.InferenceTiming}:
InferenceTiming: 0.000020/0.000020 on convert(::Type{Int64}, 0::Int64)
InferenceTiming: 0.000025/0.000025 on Base._counttuple(::Type{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}})
InferenceTiming: 0.000026/0.000026 on Base.argtail(::Vector{SVector{100, Int64}}, ::Vector{SVector{100, Int64}})
InferenceTiming: 0.000027/0.000027 on Base.isdone(::Vector{SVector{100, Int64}})
InferenceTiming: 0.000027/0.000027 on ndims(::Vector{Base.HasShape{1}})
InferenceTiming: 0.000028/0.000028 on convert(::Type{Int64}, 1::Int64)
InferenceTiming: 0.000028/0.000028 on Base.argtail(((SOneTo(100), 1),)::Tuple{Tuple{SOneTo{100}, Int64}}, (((SOneTo(100), 1),),)::Tuple{Tuple{SOneTo{100}, Int64}})
InferenceTiming: 0.000029/0.000029 on Base.isdone(::SVector{100, Int64}, ::Tuple{SOneTo{100}, Int64})
InferenceTiming: 0.000029/0.000029 on getproperty(Core.Compiler::Module, return_type::Symbol)
InferenceTiming: 0.000029/0.000029 on Base.Iterators.and_iteratorsize(Base.HasShape{1}()::Base.HasShape{1}, Base.HasShape{1}()::Base.HasShape{1})
InferenceTiming: 0.000030/0.000030 on convert(::Type{Base.HasShape{1}}, Base.HasShape{1}()::Base.HasShape{1})
InferenceTiming: 0.000030/0.000030 on +(2::Int64, 1::Int64)
InferenceTiming: 0.000030/0.000030 on (::Base.Iterators.var"#5#6")(::Vector{SVector{100, Int64}})
InferenceTiming: 0.000030/0.000030 on Base.argtail(2::Int64, (2,)::Int64)
InferenceTiming: 0.000031/0.000031 on (::Base.Iterators.var"#5#6")(::SVector{100, Int64})
InferenceTiming: 0.000031/0.000031 on convert(::Type{Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}}, #180::Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}})
InferenceTiming: 0.000031/0.000031 on getproperty(::UnitRange{Int64}, start::Symbol)
InferenceTiming: 0.000031/0.000031 on Base.isdone(::Vector{SVector{100, Int64}}, ::Int64)
InferenceTiming: 0.000033/0.000033 on getproperty(::Base.Generator{UnitRange{Int64}, Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}}, f::Symbol)
InferenceTiming: 0.000033/0.000033 on getproperty(::Base.Generator{UnitRange{Int64}, Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}}, f::Symbol)
InferenceTiming: 0.000033/0.000033 on Base.argtail(::Vector{SVector{100, Int64}})
InferenceTiming: 0.000033/0.000033 on Base.argtail(::Tuple{Tuple{SOneTo{100}, Int64}})
InferenceTiming: 0.000033/0.000033 on getproperty(::Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, is::Symbol)
⋮
InferenceTiming: 0.000611/0.000972 on Base.Iterators._zip_iterate_some(::Tuple{SVector{100, Int64}, SVector{100, Int64}}, ::Tuple{Tuple{Tuple{SOneTo{100}, Int64}}, Tuple{Tuple{SOneTo{100}, Int64}}}, (missing, missing)::Tuple{Missing, Missing}, missing::Missing)
InferenceTiming: 0.000616/0.002379 on Base.collect_to_with_first!(::Vector{Base.HasShape{1}}, Base.HasShape{1}()::Base.HasShape{1}, ::Base.Generator{UnitRange{Int64}, Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}}, ::Int64)
InferenceTiming: 0.000655/0.001437 on Base.Generator(::Base.var"#4#5", ::Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}})
InferenceTiming: 0.000672/0.002557 on Base.Iterators._zip_iterate_all(::Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}, ::Tuple{Tuple{Int64}, Tuple{Int64}})
InferenceTiming: 0.000683/0.003044 on Base.Iterators._zip_iterate_all(::Tuple{SVector{100, Int64}, SVector{100, Int64}}, ::Tuple{Tuple{Tuple{SOneTo{100}, Int64}}, Tuple{Tuple{SOneTo{100}, Int64}}})
InferenceTiming: 0.000710/0.000781 on (::Type{Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, _A}} where _A)(::Base.var"#4#5", ::Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}})
InferenceTiming: 0.000754/0.005946 on Base.Iterators._zip_iterate_all(::Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}, ((), ())::Tuple{Tuple{}, Tuple{}})
InferenceTiming: 0.000876/0.012894 on Base._ntuple(#7::Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, ::Int64)
InferenceTiming: 0.000877/0.007580 on iterate(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}})
InferenceTiming: 0.000901/0.003896 on iterate(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}, ::Tuple{Int64, Int64})
InferenceTiming: 0.000968/0.000968 on iterate(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, Base.var"#4#5"{typeof(==)}}, ::Tuple{Int64, Int64})
InferenceTiming: 0.000984/0.004880 on Base.collect_to!(::Vector, ::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}, ::Int64, ::Tuple{Int64, Int64})
InferenceTiming: 0.001033/0.019158 on iterate(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, Base.var"#4#5"{typeof(==)}})
InferenceTiming: 0.001042/0.003677 on collect(::Base.Generator{UnitRange{Int64}, Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}})
InferenceTiming: 0.001164/0.001164 on Base.collect_to!(::Vector{Bool}, ::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, Base.var"#4#5"{typeof(==)}}, ::Int64, ::Tuple{Int64, Int64})
InferenceTiming: 0.001167/0.003237 on Base.Generator(::Function, ::Vector{SVector{100, Int64}}, ::Vector{SVector{100, Int64}})
InferenceTiming: 0.001316/0.017500 on ==(::SVector{100, Int64}, ::SVector{100, Int64})
InferenceTiming: 0.001600/0.007299 on collect(::Base.Generator{UnitRange{Int64}, Base.var"#180#181"{Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}}})
InferenceTiming: 0.001611/0.033633 on collect(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}})
InferenceTiming: 0.001777/0.021903 on Base._iterator_upper_bound(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, Base.var"#4#5"{typeof(==)}})
InferenceTiming: 0.001970/0.015039 on ntuple(#7::Base.Iterators.var"#7#8"{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, ::Int64)
InferenceTiming: 0.002186/0.027022 on collect(::Base.Generator{Base.Iterators.Zip{Tuple{Vector{SVector{100, Int64}}, Vector{SVector{100, Int64}}}}, Base.var"#4#5"{typeof(==)}})
InferenceTiming: 13.431963/13.496669 on Core.Compiler.Timings.ROOT()
The majority of time is not spent on inference (ROOT takes 99.5% of time). So, should be mostly code-generation?
Not really sure what could be done to fix this on the StaticArrays side - probably more of a Base thing?
Also, this is only really bad when the considered SVector is very large.
Agree that this may be an unrealistically large SVector. I encountered this in a code base that I have since re-factored, but it left me curious because it broke my understanding of how the compiled specialization of ==(::SVector, ::SVector) would be re-used.
Would anyone recommend that I move this issue to Base?
The time is almost entirely LLVM. On the Julia side the most expensive thing can be precompiled as:
precompile(Tuple{typeof(Base.collect), Base.Generator{Base.Iterators.Zip{Tuple{Array{StaticArrays.SArray{Tuple{100}, Int64, 1, 100}, 1}, Array{StaticArrays.SArray{Tuple{100}, Int64, 1, 100}, 1}}}, Base.var"#4#5"{typeof(Base.:(==))}}})
and profiling that call gives something like that:
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
1╎1 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZL17GroupByComplexityRN4llvm15SmallVectorImplIPKNS_4SCEVEEEPNS_8LoopInfoERNS_13DominatorTreeE
2╎2 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZL21CompareSCEVComplexityRN4llvm18EquivalenceClassesIPKNS_4SCEVEEERNS0_IPKNS_5ValueEEEPKNS_8LoopInfoES3_S3_RNS_13DominatorTreeEj
1╎1 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm16MetadataTracking7untrackEPvRNS_8MetadataE
1╎1 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZNK4llvm11Instruction15getMetadataImplEj
1╎1 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZNK4llvm19FoldingSetNodeIDRef11ComputeHashEv
1╎1 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZNSt8_Rb_treeIN4llvm18EquivalenceClassesIPKNS0_4SCEVEE7ECValueES6_St9_IdentityIS6_ESt4lessIS6_ESaIS6_EE8_M_eraseEPSt13_Rb_tree_nodeIS6_E
2╎2 /lib/x86_64-linux-gnu/libc.so.6:?; __libc_malloc
╎10087 [unknown stackframe]
╎ 10083 [unknown stackframe]
╎ 10083 [unknown stackframe]
8╎ 10083 [unknown stackframe]
╎ 9948 julia:?;
╎ 9948 /lib/x86_64-linux-gnu/libc.so.6:?; __libc_start_main
╎ ╎ 9948 julia:?; main
╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/jlapi.c:701; jl_repl_entrypoint
╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/jlapi.c:559; true_main
╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/julia.h:1788; jl_apply
╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2429; jl_apply_generic
╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2247; _jl_invoke
╎ ╎ ╎ 9948 /home/mateusz/bin/julia-1.7.0/lib/julia/sys.so:?; jfptr__start_43127.clone_1
╎ ╎ ╎ 9948 @Base/client.jl:495; _start()
╎ ╎ ╎ 9948 @Base/client.jl:309; exec_options(opts::Base.JLOptions)
╎ ╎ ╎ 9948 @Base/client.jl:379; run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
╎ ╎ ╎ ╎ 9948 @Base/essentials.jl:714; invokelatest
╎ ╎ ╎ ╎ 9948 @Base/essentials.jl:716; #invokelatest#2
╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/builtins.c:757; jl_f__call_latest
╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/julia.h:1788; jl_apply
╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2429; jl_apply_generic
╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2247; _jl_invoke
╎ ╎ ╎ ╎ ╎ 9948 /home/mateusz/bin/julia-1.7.0/lib/julia/sys.so:?; jfptr_YY.930_32578.clone_1
╎ ╎ ╎ ╎ ╎ 9948 @Base/client.jl:394; (::Base.var"#930#932"{Bool, Bool, Bool})(REPL::Module)
╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2429; jl_apply_generic
╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/gf.c:2247; _jl_invoke
╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349; run_repl(repl::REPL.AbstractREPL, consumer::Any)
╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229; start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244; repl_backend_loop(backend::REPL.REPLBackend)
╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150; eval_user_input(ast::Any, backend::REPL.REPLBackend)
╎ ╎ ╎ ╎ ╎ ╎ ╎ 9948 @Base/boot.jl:373; eval
╎ ╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/toplevel.c:944; jl_toplevel_eval_in
╎ ╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/toplevel.c:830; jl_toplevel_eval_flex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/toplevel.c:885; jl_toplevel_eval_flex
╎ ╎ ╎ ╎ ╎ ╎ ╎ 9948 /buildworker/worker/package_linux64/build/src/interpreter.c:731; jl_interpret_toplevel_thunk
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/interpreter.c:516; eval_body
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/interpreter.c:461; eval_body
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/interpreter.c:215; eval_value
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/interpreter.c:126; do_call
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/julia.h:1788; jl_apply
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/gf.c:2429; jl_apply_generic
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/gf.c:2247; _jl_invoke
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /home/mateusz/bin/julia-1.7.0/lib/julia/sys.so:?; jfptr_precompile_19408.clone_1
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 @Base/loading.jl:1936; precompile(argt::Type)
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/gf.c:2173; jl_compile_hint
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/gf.c:1921; jl_compile_method_internal
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9947 /buildworker/worker/package_linux64/build/src/gf.c:1980; jl_compile_method_internal
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9881 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:350; jl_generate_fptr
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:154; _jl_compile_codeinst
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:1125; jl_add_to_ee
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:1103; jl_add_to_ee
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:1059; jl_add_to_ee
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:779; addModule
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS_9StringRefENS0_11SymbolStateE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession6lookupENS_8ArrayRefIPNS0_8JITDylibEEENS0_15SymbolStringPtrENS0_11SymbolStateE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolStringPtrENS0_11SymbolStateE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EERKNS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapIn...
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEval...
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_1...
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZNSt17_Function_handlerIFvSt10unique_ptrIN4llvm3orc19MaterializationUnitESt14default_deleteIS3_EES0_INS2_29MaterializationResponsibilityES4_IS7_EEEPSA_E9_M_invokeERKSt9_Any_dataOS6_OS9_
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc16ExecutionSession26materializeOnCurrentThreadESt10unique_ptrINS0_19MaterializationUnitESt14default_deleteIS3_EES2_INS0_29MaterializationResponsibilityES4_IS7_EE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9879 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm3orc14IRCompileLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9878 /buildworker/worker/package_linux64/build/src/jitlayers.cpp:612; operator()
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9878 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9544 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN12_GLOBAL__N_113CGPassManager11runOnModuleERN4llvm6ModuleE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9544 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9517 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN12_GLOBAL__N_113JumpThreading13runOnFunctionERN4llvm8FunctionE.part.687
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9517 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass7runImplERNS_8FunctionEPNS_17TargetLibraryInfoEPNS_13LazyValueInfoEPNS_9AAResultsEPNS_14DomTreeUpdaterEbSt10unique_ptrINS_18BlockFrequencyInfoESt14default_deleteISC_EESB_INS_21BranchProbabilityInfoESD_ISG_EE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9461 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass12processBlockEPNS_10BasicBlockE
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9358 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass22processThreadableEdgesEPNS_5ValueEPNS_10BasicBlockENS_13jumpthreading18ConstantPreferenceEPNS_11InstructionE.part.683
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9354 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass13tryThreadEdgeEPNS_10BasicBlockERKNS_15SmallVectorImplIS2_EES2_
╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9352 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass10threadEdgeEPNS_10BasicBlockERKNS_15SmallVectorImplIS2_EES2_
4╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9086 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm17JumpThreadingPass9updateSSAEPNS_10BasicBlockES2_RNS_8DenseMapIPNS_11InstructionEPNS_5ValueENS_12DenseMapInfoIS5_EENS_6detail12DenseMapPairIS5_S7_EEEE
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 9053 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm10SSAUpdater10RewriteUseERNS_3UseE
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1076 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm10SSAUpdater23GetValueInMiddleOfBlockEPNS_10BasicBlockE
39╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1075 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm10SSAUpdater28GetValueAtEndOfBlockInternalEPNS_10BasicBlockE
276╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 7976 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm10SSAUpdater28GetValueAtEndOfBlockInternalEPNS_10BasicBlockE
970╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1349 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm14SSAUpdaterImplINS_10SSAUpdaterEE14BuildBlockListEPNS_10BasicBlockEPNS_15SmallVectorImplIPNS2_6BBInfoEEE
6117╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 6340 /home/mateusz/bin/julia-1.7.0/bin/../lib/julia/libLLVM-12jl.so:?; _ZN4llvm14SSAUpdaterImplINS_10SSAUpdaterEE17FindAvailableValsEPNS_15SmallVectorImplIPNS2_6BBInfoEEE
No idea why that particular thing is so expensive for LLVM, this would need some input from an LLVM expert. I don't know if there is any interest in fixing this in Base but maybe there is a reasonable workaround that could be implemented in StaticArrays.jl.
Thank you for these insights. This helps my understanding, at least confirming my intuition was reasonable, and that an explanation is definitely beyond me! I'm not sure when it would be best to close this issue / move it. Thought of a few questions for the air: who are examples of people in the LLVM community that may be interested in / capable of taking a closer look? And if StaticArrays.jl had some work-around, would it basically look like unrolling the map operation? I tried to locate the code for map, but had trouble locating it as I don't have enough familiarity with the julia codebase (and maybe it's implemented at some lower level, etc.)
who are examples of people in the LLVM community that may be interested in / capable of taking a closer look?
I guess you could ask in the internals channel of Julia Slack, many people there know some LLVM.
And if
StaticArrays.jlhad some work-around, would it basically look like unrolling themapoperation?
map in StaticArrays.jl is already unrolled but this: map(==, [SVector(1:100...)], [SVector(1:100...)]); doesn't use map from StaticArrays.jl but the generic implementation from Base (since you map over normal Vectors). We could likely catch this particular case in StaticArrays.jl and output a code that is easier for LLVM.
There are a few tools for inspecting what methods are called. Personally I like to use Cthulhu for quick exploration, for example:
julia> using StaticArrays, Cthulhu
julia> map(==, [SVector(1:100...)], [SVector(1:100...)]);
julia> @descend map(==, [SVector(1:100...)], [SVector(1:100...)])