BAT.jl icon indicating copy to clipboard operation
BAT.jl copied to clipboard

BAT dependencies, package load time(s) and modularization

Open oschulz opened this issue 3 years ago • 10 comments

This issue is intended to keep track of modularizing (splitting up) BAT to reduce package load time(s) and increase flexibility.

The current high load time of BAT is mainly due to it's dependencies. Dependency optiions and load-time cost, preliminary analysis:

Unavoidable expensive core deps:

  • Distributions (includes StatsBase)
  • StaticArrays (indirect through many packages)

Unavoidable non-negligible deps:

  • ArraysOfArrays
  • ValueShapes
  • RecipesBase
  • Possibly some ArrayInterface packages (ArrayInterfaceCore is now very lightweight)

Hard-to-avoid non-negligible direct/indirect deps

  • BangBang
  • Transducers (non-negligible cost on top of BangBang)
  • TerminalLoggers
  • PrettyTables (may be avoidable)
  • DataStructures

Cheap deps:

  • StructArrays: Very cheap on top of StaticArrays
  • AbstractDifferentiation: Almost free on top of ChainRulesCore

Autodiff choices:

  • ForwardDiff: Quite cheap on top of StaticArrays and Distributions
  • FiniteDifferences: Basically free on top of StaticArrays and Distributions
  • FiniteDiff: Instead of FiniteDifferences, would pull in ArrayInterface
  • Zygote: Expensive
  • (Future option) Diffractor: Quite cheap on top of StaticArrays

Math deps:

  • LinearMaps: About 100 ms, no deps

Statistics deps:

  • LogarithmicNumbers: Very cheap on top of unavoidable deps
  • MeasureBase: Not that expensive on top of unavoidable deps (uses LogarithmicNumbers)

Optimizer deps:

  • NLSolversBase: Cheap itself, but depends on ArrayInterface, ChainRulesCore, SpecialFunctions and ForwardDiff
  • Optim: Cheap itself, but pulls in ForwardDiff and ArrayInterface via FiniteDiff, BAT would need only Nelder-Mead and L-BFGS for core functionality though
  • LBFGSB: Cheap, but indroduces binary deps
  • Manopt: Would be chop on top of StaticArrays if it didn't pull in ColorSchemes

Sampler deps:

  • AbstractMCMC: Cheap on top of StaticArrays, Distributions, Transducers, TerminalLoggers
  • AdvancedHMC: Quite cheap on top of AbstractMCMC
  • PSIS: Cheap on top of unavoidable deps

Integator deps:

  • CUBA: Cheap, but indroduces binary deps
  • MonteCarloIntegration: Almost free on top of unavoidable deps
  • HCubature: Almost free on top of unavoidable deps
  • QuadGK: Cheap, but only univariate

Deps to avoid:

  • Folds: Significant load time cost on top of Transducers
  • SciMLBase: Really expensive
  • GalacticOptim: Not cheap even on top of SciMLBase, also load-time interplay with SciMLBase
  • Optimization: Over 1000 ms load time on top of it's heavy deps (Optim, GPUArrays, RecursiveArrayTools, SciMLBase)

Expensive deps to get rid of:

  • DoubleFloats
  • Polynomials
  • ... ?

Non-negligible deps to get rid of:

  • KernelDensity
  • PrettyTables (maybe, see above)
  • AdaptiveRejectionSampling (cost due to ForwardDiff though)
  • ... ?

oschulz avatar May 07 '22 04:05 oschulz

GalacticOptim: Really expensive even on top of SciMLBase

Even with v3 splitting out the solvers?

ChrisRackauckas avatar May 07 '22 09:05 ChrisRackauckas

SciMLBase: Really expensive

Interesting, how expensive, and due to what? There's not much in there 😅

ChrisRackauckas avatar May 07 '22 09:05 ChrisRackauckas

SciMLBase: Really expensive Interesting, how expensive, and due to what? There's not much in there

I was very surprised as well, I had always assumed SciMLBase to be very lightweight, I just never timed it before.

GalacticOptim: Really expensive even on top of SciMLBase

Hm, maybe I had GalacticOptim v2 due to some dep in my tests - though even with v3 it has a not insignificant load time:

Session 1:

pkg> st SciMLBase GalacticOptim
Status `/user/.julia/environments/temp/Project.toml`
  [a75be94c] GalacticOptim v3.1.1
  [0bca4576] SciMLBase v1.31.3

julia> using InverseFunctions # load a super-lightweight package to get some initial Pkg costs out of the way

julia> @time using SciMLBase
  2.356626 seconds (5.58 M allocations: 423.052 MiB, 4.17% gc time, 49.19% compilation time)

Session 2:

julia> using InverseFunctions

julia> @time using SciMLBase, GalacticOptim
  2.990720 seconds (6.56 M allocations: 475.628 MiB, 5.91% gc time, 60.39% compilation time)

Session 3:

julia> using InverseFunctions

julia> @time_imports using SciMLBase, GalacticOptim
     10.9 ms    ┌ MacroTools
     18.5 ms  ┌ ZygoteRules
      0.2 ms    ┌ IteratorInterfaceExtensions
      0.7 ms  ┌ TableTraits
      3.5 ms  ┌ Compat
      0.5 ms  ┌ Requires
    141.0 ms  ┌ FillArrays
      0.2 ms  ┌ DataValueInterfaces
    532.6 ms  ┌ StaticArrays
      3.2 ms    ┌ DocStringExtensions
      0.2 ms    ┌ IfElse
     19.4 ms    ┌ RecipesBase
     37.2 ms      ┌ Static
    718.0 ms    ┌ ArrayInterface
      0.7 ms    ┌ Adapt
     58.6 ms    ┌ ChainRulesCore
    864.2 ms  ┌ RecursiveArrayTools
      1.4 ms    ┌ DataAPI
     14.6 ms  ┌ Tables
      0.2 ms  ┌ CommonSolve
      0.8 ms  ┌ ConstructionBase
      0.2 ms  ┌ TreeViews
   1829.6 ms  SciMLBase
      2.7 ms  ┌ DiffResults
      0.3 ms  ┌ Reexport
      6.3 ms    ┌ AbstractTrees
      2.9 ms    ┌ ProgressLogging
      5.6 ms    ┌ LeftChildRightSiblingTrees
     16.7 ms  ┌ TerminalLoggers
      4.8 ms  ┌ ProgressMeter
      2.1 ms  ┌ LoggingExtras
      0.8 ms  ┌ ConsoleProgressMonitor
    321.1 ms  GalacticOptim

Also, in comparison (though Optim and GalacticOptim are not actually replacements for each other of course) - loading Optim:

julia> using InverseFunctions

julia> @time using Distributions, StaticArrays, ArrayInterface # Pretty much unavoidable for BAT
  1.918963 seconds (4.79 M allocations: 333.435 MiB, 3.12% gc time, 37.83% compilation time)

julia> @time using Optim
  0.181226 seconds (475.28 k allocations: 30.906 MiB, 14.12% gc time, 13.72% compilation time)

Loading GalacticOptim:

julia> using InverseFunctions

julia> @time using Distributions, StaticArrays, ArrayInterface # Unavoidable deps for BAT
  1.921450 seconds (4.79 M allocations: 333.419 MiB, 3.04% gc time, 37.15% compilation time)

julia> @time using GalacticOptim
  0.936281 seconds (2.32 M allocations: 181.437 MiB, 3.88% gc time, 59.31% compilation time)

julia> @time using SciMLBase
  0.595445 seconds (861.23 k allocations: 45.663 MiB, 6.73% gc time, 99.98% compilation time)

Why does loading SciMLBase after GalacticOptim take any time at all? This is really weird:

julia> using InverseFunctions

julia> @time using Distributions, StaticArrays, ArrayInterface # Unavoidable deps for BAT
  1.913109 seconds (4.79 M allocations: 333.435 MiB, 3.06% gc time, 37.17% compilation time)

julia> @time using SciMLBase, GalacticOptim
  1.513281 seconds (3.17 M allocations: 227.054 MiB, 4.97% gc time, 75.43% compilation time)

Why is loading SciMLBase and GalacticOptim more expensive than just loading GalacticOptim, which depends on SciMLBase? Some strange Requires effect?

oschulz avatar May 07 '22 13:05 oschulz

SciMLBase seems to have some strange load time effects depending on order of package loading in general. I don't get why ...

When timing it all in one go it's not so extreme, but around 300 ms still seems to be very high for the actual code of a "...Base" package:

julia> using InverseFunctions

julia> @time_imports using StaticArrays, ArrayInterface, RecursiveArrayTools, SciMLBase
    527.0 ms  StaticArrays
      3.3 ms  ┌ Compat
      0.4 ms  ┌ Requires
      0.1 ms  ┌ IfElse
     36.9 ms  ┌ Static
    721.1 ms  ArrayInterface
     10.4 ms    ┌ MacroTools
     11.0 ms  ┌ ZygoteRules
    149.6 ms  ┌ FillArrays
      3.3 ms  ┌ DocStringExtensions
     19.0 ms  ┌ RecipesBase
      0.6 ms  ┌ Adapt
     59.7 ms  ┌ ChainRulesCore
    309.3 ms  RecursiveArrayTools
      0.2 ms    ┌ IteratorInterfaceExtensions
      0.6 ms  ┌ TableTraits
      0.2 ms  ┌ DataValueInterfaces
      1.3 ms    ┌ DataAPI
     14.5 ms  ┌ Tables
      0.2 ms  ┌ CommonSolve
      0.8 ms  ┌ ConstructionBase
      0.2 ms  ┌ TreeViews
    307.5 ms  SciMLBase

StaticArrays and especially ArrayInterface make up more of the total load time, of course, together with RecursiveArrayTools which is also not exactly lightweight.

oschulz avatar May 07 '22 13:05 oschulz

I wonder how requires is measured. My guess is that it's triggering requires in ArrayInterface and that's measured as part of the SciMLBase time.

ChrisRackauckas avatar May 07 '22 13:05 ChrisRackauckas

I would think so ... probably have to ask Tim or so. :-)

oschulz avatar May 07 '22 13:05 oschulz

that it's triggering requires in ArrayInterface

Oh wow, ArrayInterface has a lot of requires!

oschulz avatar May 07 '22 14:05 oschulz

Hence the idea to make it like GalacticOptim in terms of subpackages.

ChrisRackauckas avatar May 07 '22 14:05 ChrisRackauckas

~~You mean JuliaArrays/ArrayInterface.jl#211? Yes, that would be great - and if there a more lightweight parts, maybe some of the requires can be turned into thoses packages depending on it? I feel we have quite a few dependencies in the ecosystem right now that should be the other way round, or common interface packages are missing, and lot's of requires to compensate for it.~~

Update: We have a very lightweight ArrayInterfaceCore now.

oschulz avatar May 07 '22 15:05 oschulz

and if there a more lightweight parts, maybe some of the requires can be turned into thoses packages depending on it?

Indeed, that's the dream.

ChrisRackauckas avatar May 07 '22 15:05 ChrisRackauckas