DescriptorSystems.jl icon indicating copy to clipboard operation
DescriptorSystems.jl copied to clipboard

Extreme compile times for some functions

Open baggepinnen opened this issue 3 years ago • 7 comments

I've experienced quite extreme compile times for some functions, here's an example

julia> using DescriptorSystems

julia> G = rss(2,2,4);

julia> @time gnugap(G, G)
 59.315304 seconds (185.38 M allocations: 10.129 GiB, 6.14% gc time, 99.97% compilation time)
(6.598356591423948e-16, 2.55648096966907)

This problem is likely to be related to #8, since type piracy is likely to lead to method invalidations and inference problems that in turn cause recompilation of a lot of base methods. To investigate this hypothesis, we can make use of SnoopCompile.jl

using Revise
using SnoopCompileCore
using DescriptorSystems;
G = rss(2,2,4);

tinf = @snoopi_deep gnugap(G, G)

using ProfileView, SnoopCompile
ProfileView.view(flamegraph(tinf))

This displays the following graph Screenshot from 2022-01-13 14-02-07 The colored flames indicates stacks, and their extent in the horizontal direction indicate their run time. Notice how there is a large horizontal gap in the flame graph, followed immediately by the stack for Base.typed_hvcat (where I've held the mouse to display the method name in the screenshot. This is a clear indication that the type piracy of #8 causes huge compile times, likely due to type inference problems.

Running

@code_warntype [G G]

indicates that there are type inference problems present when concatenating systems:

julia> @code_warntype [G G]
MethodInstance for hcat(::DescriptorStateSpace{Float64}, ::DescriptorStateSpace{Float64})
  from hcat(SYS1::DescriptorStateSpace, SYS2::DescriptorStateSpace) in DescriptorSystems at /home/fredrikb/.julia/packages/DescriptorSystems/20HLg/src/connections.jl:71
Arguments
  #self#::Core.Const(hcat)
  SYS1::DescriptorStateSpace{Float64}
  SYS2::DescriptorStateSpace{Float64}
Locals
  D::Matrix{Float64}
  C::Matrix{Float64}
  B::Matrix{Float64}
  blockdims::Vector{Int64}
  E::Any
  A::Matrix{Float64}
  Ts::Float64
  T::Type{Float64}
  ny::Any
  @_13::Union{UniformScaling, Matrix{Float64}}
  @_14::Union{UniformScaling, Matrix{Float64}}

notice variables E::Any and @_13::Union{UniformScaling, Matrix{Float64}}.

The other gaps in the flame graph points to the folowing methods

function Base.hvcat(rows::Tuple{Vararg{Int}}, A::Union{DescriptorStateSpace, AbstractVecOrMat{<:Number}, Number, UniformScaling}...)

as well as https://github.com/andreasvarga/DescriptorSystems.jl/blob/main/src/dss.jl#L52

baggepinnen avatar Jan 13 '22 13:01 baggepinnen

Digging a bit deeper, it turns out that we're also suffering from https://github.com/invenia/Intervals.jl/issues/144

julia> using SnoopCompile

julia> invalidations = @snoopr begin
           using DescriptorSystems;
           G = rss(2,2,4);
           gnugap(G, G)
       end;

julia> trees = SnoopCompile.invalidation_trees(invalidations);

julia> @show length(SnoopCompile.uinvalidated(invalidations)) # show total invalidations
length(SnoopCompile.uinvalidated(invalidations)) = 1900
1900

julia> show(trees[end]) # show the most invalidated method
inserting convert(::Type{T}, interval::Intervals.AnchoredInterval{P, T}) where {P, T} in Intervals at /home/fredrikb/.julia/packages/Intervals/ua9cq/src/anchoredinterval.jl:181 invalidated:

there are almost 2000 invalidations from running the small code snippet above which is huge, and about half of those comes from https://github.com/invenia/Intervals.jl/issues/144

baggepinnen avatar Jan 13 '22 13:01 baggepinnen

Yes, I am aware of this problem. Probably you are right, an it is related to #8. You probably recall that when I tried to alleviate the issue, I faced some bugs in the implementation of hcat\vcat\hvcat, which still manifest in the LTS version 1.6.5:

julia> using LinearAlgebra

julia> [1 2; I]
ERROR: ArgumentError: number of columns of each array must match (got (2, 1))
Stacktrace:
 [1] _typed_vcat(#unused#::Type{Any}, A::Tuple{Matrix{Any}, Matrix{Any}})
   @ Base .\abstractarray.jl:1553
 [2] typed_vcat(::Type{Any}, ::Matrix{Any}, ::Matrix{Any})
   @ Base .\abstractarray.jl:1567
 [3] typed_hvcat(::Type{Any}, ::Tuple{Int64, Int64}, ::Int64, ::Vararg{Any, N} where N)
   @ Base .\abstractarray.jl:1957
 [4] hvcat(::Tuple{Int64, Int64}, ::Int64, ::Vararg{Any, N} where N)
   @ Base .\abstractarray.jl:1932
 [5] top-level scope
   @ REPL[2]:1

julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake) 

So, I stopped investing more time in this direction, until this bug will be corrected in the LTS version, Fortunately, my implementation can overcome this Julia bug:

using DescriptorSystems

julia> [1 2; I]
3×2 Matrix{Int64}:
 1  2
 1  0
 0  1

I hope I will be able to dedicate more time to this issue after finishing the implementation of the FaultDetectionTools.jl (probably in the next 2-3 months).

Regarding the invalidation aspect, I don't understand what is going on. I never used the mentioned conversion funtion directly. It is only a guess, but this function is probably imported via the Polynomials package, which has Interval.jl among its dependencies. But I don't have explanation for these figures.

andreasvarga avatar Jan 13 '22 16:01 andreasvarga

Just to clarify, the invalidation problem is not due to this package. I mentioned it on slack yesterday and got the answer that they might be able to solve it in Intervals.jl where the problem lies.

baggepinnen avatar Jan 14 '22 08:01 baggepinnen

I tried this again in julia v1.8-beta1 with DescriptorSystems v1.3 and the timing has improved by 20 seconds, a great step!

julia> @time gnugap(G, G)
 38.420655 seconds (115.38 M allocations: 5.127 GiB, 4.95% gc time, 99.99% compilation time)

but it still takes quite a long time. Apparently, gnugap now manages to do the bulk of the work fairly quickly, but then there is a huge inference gap (the white space in the image below) followed by a call to dss.

image

I can't really figure out what the remaining problem is, it appears as if execution reaches the call to ghinfnorm without any problems. Running

using Revise
using SnoopCompileCore
using DescriptorSystems;
G = rss(2,2,4);

tinf = @snoopi_deep ghinfnorm(G);

using ProfileView, SnoopCompile
ProfileView.view(flamegraph(tinf))

leads to a similar looking plot with what appears to be a call to == in the file uniformscaling.jl taking a lot of time to infer Screenshot from 2022-03-09 14-01-05

baggepinnen avatar Mar 09 '22 13:03 baggepinnen

The call to == is probably

    disc || E == I || rcond(E) >= n*eps(float(real(T1))) || (return Inf, Inf)

baggepinnen avatar Mar 09 '22 13:03 baggepinnen

It looks like the compilataion time has regressed again on julia v1.8.0-rc1

julia> using DescriptorSystems

julia> G = rss(2,2,4);

julia> @time gnugap(G, G)
103.958712 seconds (161.42 M allocations: 7.936 GiB, 4.80% gc time, 99.99% compilation time)
(6.867873289263948e-16, 1.2311753343767355)

up from 38.42 seconds a few posts up.

The same version of DescriptorSystems (v1.3.2) on julia v1.7 is a bit better, but still very long compile time

julia> @time gnugap(G, G)
 72.154130 seconds (135.50 M allocations: 6.680 GiB, 3.71% gc time, 99.95% compilation time)
(6.21299328221319e-16, Inf)

EDIT: Somthing was off with my initial benchmark, it's actually a slight bit faster on 1.8

63.557759 seconds (161.45 M allocations: 7.938 GiB, 4.64% gc time, 99.99% compilation time)

still, much worse than it was in https://github.com/andreasvarga/DescriptorSystems.jl/issues/9#issuecomment-1062898754

baggepinnen avatar Jun 22 '22 06:06 baggepinnen

I fixed the issues related to possible type instability in

@code_warntype [G G]

In Julia 1.8, there are no red lines anymore in the output. Also [1 2; I] works as expected using only LinearAlgebra calls.

Inspecting with the cursor the result of ProfileView.view(flamegraph(tinf)) for the gnugap function, there are some calls to SparseMatrix.setindex! , which in my opinion should not appear, I have no idea how to get rid of this.

The compilation times are hopefully better, although probably still too long. I should have a look to the precompilation possibilities or producing sysimages.

andreasvarga avatar Oct 21 '22 14:10 andreasvarga