MLUtils seems quite heavy
I am increasingly relying on the getobs/nobs interface in quite low-level packages I am working on. It's nice to be able to work generically with tables and arrays. But I only need this basic API and simple things like eachobs. I'm finding MLUtils.jl rather heavy for this purpose (46s precompile/load on julia 1.9).
Are there any plans for factoring out base functionality or moving stuff out to weak dependencies?
I see that StaticArrays constributes lot to load times. The dependency here is NNlib -> KernelAbstractions -> StaticArrays. What's in NNlib that's needed here? (Maybe KernelAbstractions only needs StaticArraysCore?)
julia> @time_imports using MLUtils
1.1 ms Statistics
7.3 ms ShowCases
0.3 ms Compat
0.5 ms Compat → CompatLinearAlgebraExt
1.2 ms ConstructionBase
10.7 ms InitialValues
0.4 ms Requires
0.5 ms DataValueInterfaces
1.2 ms DataAPI
0.5 ms IteratorInterfaceExtensions
0.5 ms TableTraits
32.2 ms Tables
10.6 ms MacroTools
27.5 ms ChainRulesCore
0.9 ms ZygoteRules
3.7 ms StaticArraysCore
17.8 ms Setfield
17.0 ms BangBang
0.9 ms ContextVariablesX
0.5 ms FLoopsBase
1.1 ms PrettyPrint
0.5 ms NameResolution
126.0 ms MLStyle
3.0 ms JuliaVariables
0.4 ms Adapt
0.5 ms ArgCheck
14.1 ms Baselet
0.6 ms CompositionsBase
0.5 ms DefineSingletons
9.8 ms MicroCollections
14.6 ms SplittablesBase
34.1 ms Transducers
4.2 ms FLoops
1.1 ms InverseFunctions
18.8 ms Accessors
18.5 ms FunctionWrappers
235.6 ms FoldsThreads 309.83% compilation time
60.5 ms DataStructures
0.6 ms SortingAlgorithms
9.3 ms Missings
1.0 ms DocStringExtensions
4.7 ms IrrationalConstants
0.4 ms LogExpFunctions
0.6 ms LogExpFunctions → LogExpFunctionsChainRulesCoreExt
0.4 ms LogExpFunctions → LogExpFunctionsInverseFunctionsExt
0.4 ms StatsAPI
17.3 ms StatsBase
2.7 ms SimpleTraits
6.0 ms UnsafeAtomics
12.9 ms Atomix
2.2 ms GPUArraysCore
13.8 ms Preferences
0.4 ms PrecompileTools
435.4 ms StaticArrays
1.1 ms ConstructionBase → ConstructionBaseStaticArraysExt
0.5 ms Adapt → AdaptStaticArraysExt
0.5 ms Accessors → AccessorsStaticArraysExt
3.7 ms CEnum
0.4 ms JLLWrappers
242.0 ms LLVMExtra_jll 98.67% compilation time (98% recompilation)
42.7 ms LLVM
4.7 ms UnsafeAtomicsLLVM
27.9 ms KernelAbstractions
30.3 ms NNlib 57.78% compilation time
1.4 ms DelimitedFiles
7.0 ms MLUtils
NNlib is used in a couple of places in https://github.com/JuliaML/MLUtils.jl/blob/main/src/utils.jl, but I don't think those would be too difficult to change or vendor the functions used.
Yes, it would be nice to excise the NNlib dependency. Its functionality is used in
-
chunk -
rpad_constant - anything else?
so we could move those functions to NNlib.
Anyone have some time to revisit this?
The biggest blocker is still what to use in place of NNlib.scatter for https://github.com/JuliaML/MLUtils.jl/blob/09c87f7097536384cea0a132aa0012679df18175/src/utils.jl#L201. Vendoring scatter won't help since it depends on KernelAbstractions.
It'd also be worth redoing the import timings since the JuliaFolds packages have changed ownership and received some bugfixes since this issue was originally opened.
Related (duplication?): https://github.com/JuliaML/MLUtils.jl/issues/90