BAT dependencies, package load time(s) and modularization
This issue is intended to keep track of modularizing (splitting up) BAT to reduce package load time(s) and increase flexibility.
The current high load time of BAT is mainly due to it's dependencies. Dependency optiions and load-time cost, preliminary analysis:
Unavoidable expensive core deps:
- Distributions (includes StatsBase)
- StaticArrays (indirect through many packages)
Unavoidable non-negligible deps:
- ArraysOfArrays
- ValueShapes
- RecipesBase
- Possibly some ArrayInterface packages (ArrayInterfaceCore is now very lightweight)
Hard-to-avoid non-negligible direct/indirect deps
- BangBang
- Transducers (non-negligible cost on top of BangBang)
- TerminalLoggers
- PrettyTables (may be avoidable)
- DataStructures
Cheap deps:
- StructArrays: Very cheap on top of StaticArrays
- AbstractDifferentiation: Almost free on top of ChainRulesCore
Autodiff choices:
- ForwardDiff: Quite cheap on top of StaticArrays and Distributions
- FiniteDifferences: Basically free on top of StaticArrays and Distributions
- FiniteDiff: Instead of FiniteDifferences, would pull in ArrayInterface
- Zygote: Expensive
- (Future option) Diffractor: Quite cheap on top of StaticArrays
Math deps:
- LinearMaps: About 100 ms, no deps
Statistics deps:
- LogarithmicNumbers: Very cheap on top of unavoidable deps
- MeasureBase: Not that expensive on top of unavoidable deps (uses LogarithmicNumbers)
Optimizer deps:
- NLSolversBase: Cheap itself, but depends on ArrayInterface, ChainRulesCore, SpecialFunctions and ForwardDiff
- Optim: Cheap itself, but pulls in ForwardDiff and ArrayInterface via FiniteDiff, BAT would need only Nelder-Mead and L-BFGS for core functionality though
- LBFGSB: Cheap, but indroduces binary deps
- Manopt: Would be chop on top of StaticArrays if it didn't pull in ColorSchemes
Sampler deps:
- AbstractMCMC: Cheap on top of StaticArrays, Distributions, Transducers, TerminalLoggers
- AdvancedHMC: Quite cheap on top of AbstractMCMC
- PSIS: Cheap on top of unavoidable deps
Integator deps:
- CUBA: Cheap, but indroduces binary deps
- MonteCarloIntegration: Almost free on top of unavoidable deps
- HCubature: Almost free on top of unavoidable deps
- QuadGK: Cheap, but only univariate
Deps to avoid:
- Folds: Significant load time cost on top of Transducers
- SciMLBase: Really expensive
- GalacticOptim: Not cheap even on top of SciMLBase, also load-time interplay with SciMLBase
- Optimization: Over 1000 ms load time on top of it's heavy deps (Optim, GPUArrays, RecursiveArrayTools, SciMLBase)
Expensive deps to get rid of:
- DoubleFloats
- Polynomials
- ... ?
Non-negligible deps to get rid of:
- KernelDensity
- PrettyTables (maybe, see above)
- AdaptiveRejectionSampling (cost due to ForwardDiff though)
- ... ?
GalacticOptim: Really expensive even on top of SciMLBase
Even with v3 splitting out the solvers?
SciMLBase: Really expensive
Interesting, how expensive, and due to what? There's not much in there 😅
SciMLBase: Really expensive Interesting, how expensive, and due to what? There's not much in there
I was very surprised as well, I had always assumed SciMLBase to be very lightweight, I just never timed it before.
GalacticOptim: Really expensive even on top of SciMLBase
Hm, maybe I had GalacticOptim v2 due to some dep in my tests - though even with v3 it has a not insignificant load time:
Session 1:
pkg> st SciMLBase GalacticOptim
Status `/user/.julia/environments/temp/Project.toml`
[a75be94c] GalacticOptim v3.1.1
[0bca4576] SciMLBase v1.31.3
julia> using InverseFunctions # load a super-lightweight package to get some initial Pkg costs out of the way
julia> @time using SciMLBase
2.356626 seconds (5.58 M allocations: 423.052 MiB, 4.17% gc time, 49.19% compilation time)
Session 2:
julia> using InverseFunctions
julia> @time using SciMLBase, GalacticOptim
2.990720 seconds (6.56 M allocations: 475.628 MiB, 5.91% gc time, 60.39% compilation time)
Session 3:
julia> using InverseFunctions
julia> @time_imports using SciMLBase, GalacticOptim
10.9 ms ┌ MacroTools
18.5 ms ┌ ZygoteRules
0.2 ms ┌ IteratorInterfaceExtensions
0.7 ms ┌ TableTraits
3.5 ms ┌ Compat
0.5 ms ┌ Requires
141.0 ms ┌ FillArrays
0.2 ms ┌ DataValueInterfaces
532.6 ms ┌ StaticArrays
3.2 ms ┌ DocStringExtensions
0.2 ms ┌ IfElse
19.4 ms ┌ RecipesBase
37.2 ms ┌ Static
718.0 ms ┌ ArrayInterface
0.7 ms ┌ Adapt
58.6 ms ┌ ChainRulesCore
864.2 ms ┌ RecursiveArrayTools
1.4 ms ┌ DataAPI
14.6 ms ┌ Tables
0.2 ms ┌ CommonSolve
0.8 ms ┌ ConstructionBase
0.2 ms ┌ TreeViews
1829.6 ms SciMLBase
2.7 ms ┌ DiffResults
0.3 ms ┌ Reexport
6.3 ms ┌ AbstractTrees
2.9 ms ┌ ProgressLogging
5.6 ms ┌ LeftChildRightSiblingTrees
16.7 ms ┌ TerminalLoggers
4.8 ms ┌ ProgressMeter
2.1 ms ┌ LoggingExtras
0.8 ms ┌ ConsoleProgressMonitor
321.1 ms GalacticOptim
Also, in comparison (though Optim and GalacticOptim are not actually replacements for each other of course) - loading Optim:
julia> using InverseFunctions
julia> @time using Distributions, StaticArrays, ArrayInterface # Pretty much unavoidable for BAT
1.918963 seconds (4.79 M allocations: 333.435 MiB, 3.12% gc time, 37.83% compilation time)
julia> @time using Optim
0.181226 seconds (475.28 k allocations: 30.906 MiB, 14.12% gc time, 13.72% compilation time)
Loading GalacticOptim:
julia> using InverseFunctions
julia> @time using Distributions, StaticArrays, ArrayInterface # Unavoidable deps for BAT
1.921450 seconds (4.79 M allocations: 333.419 MiB, 3.04% gc time, 37.15% compilation time)
julia> @time using GalacticOptim
0.936281 seconds (2.32 M allocations: 181.437 MiB, 3.88% gc time, 59.31% compilation time)
julia> @time using SciMLBase
0.595445 seconds (861.23 k allocations: 45.663 MiB, 6.73% gc time, 99.98% compilation time)
Why does loading SciMLBase after GalacticOptim take any time at all? This is really weird:
julia> using InverseFunctions
julia> @time using Distributions, StaticArrays, ArrayInterface # Unavoidable deps for BAT
1.913109 seconds (4.79 M allocations: 333.435 MiB, 3.06% gc time, 37.17% compilation time)
julia> @time using SciMLBase, GalacticOptim
1.513281 seconds (3.17 M allocations: 227.054 MiB, 4.97% gc time, 75.43% compilation time)
Why is loading SciMLBase and GalacticOptim more expensive than just loading GalacticOptim, which depends on SciMLBase? Some strange Requires effect?
SciMLBase seems to have some strange load time effects depending on order of package loading in general. I don't get why ...
When timing it all in one go it's not so extreme, but around 300 ms still seems to be very high for the actual code of a "...Base" package:
julia> using InverseFunctions
julia> @time_imports using StaticArrays, ArrayInterface, RecursiveArrayTools, SciMLBase
527.0 ms StaticArrays
3.3 ms ┌ Compat
0.4 ms ┌ Requires
0.1 ms ┌ IfElse
36.9 ms ┌ Static
721.1 ms ArrayInterface
10.4 ms ┌ MacroTools
11.0 ms ┌ ZygoteRules
149.6 ms ┌ FillArrays
3.3 ms ┌ DocStringExtensions
19.0 ms ┌ RecipesBase
0.6 ms ┌ Adapt
59.7 ms ┌ ChainRulesCore
309.3 ms RecursiveArrayTools
0.2 ms ┌ IteratorInterfaceExtensions
0.6 ms ┌ TableTraits
0.2 ms ┌ DataValueInterfaces
1.3 ms ┌ DataAPI
14.5 ms ┌ Tables
0.2 ms ┌ CommonSolve
0.8 ms ┌ ConstructionBase
0.2 ms ┌ TreeViews
307.5 ms SciMLBase
StaticArrays and especially ArrayInterface make up more of the total load time, of course, together with RecursiveArrayTools which is also not exactly lightweight.
I wonder how requires is measured. My guess is that it's triggering requires in ArrayInterface and that's measured as part of the SciMLBase time.
I would think so ... probably have to ask Tim or so. :-)
that it's triggering requires in ArrayInterface
Oh wow, ArrayInterface has a lot of requires!
Hence the idea to make it like GalacticOptim in terms of subpackages.
~~You mean JuliaArrays/ArrayInterface.jl#211? Yes, that would be great - and if there a more lightweight parts, maybe some of the requires can be turned into thoses packages depending on it? I feel we have quite a few dependencies in the ecosystem right now that should be the other way round, or common interface packages are missing, and lot's of requires to compensate for it.~~
Update: We have a very lightweight ArrayInterfaceCore now.
and if there a more lightweight parts, maybe some of the requires can be turned into thoses packages depending on it?
Indeed, that's the dream.