LinearMaps.jl latency: fix cat invalidations

This patch removes the invalidation on cat and thus reduces the OneHotArrays loading time from 4.5s to 0.5s (the normal status)

using OneHotArrays alone in a new Julia process needs about 0.57s. However, using LinearMaps before that makes the loading time of OneHotArrays to about 5s.

julia> @time using OneHotArrays # v0.2.4
  0.569807 seconds (539.89 k allocations: 36.725 MiB, 4.48% compilation time)

# in a new Julia process
# LinearMaps master branch
julia> @time using LinearMaps
  0.023610 seconds (6.77 k allocations: 501.328 KiB)

julia> @time using OneHotArrays
  5.038436 seconds (13.72 M allocations: 514.807 MiB, 2.32% gc time, 0.51% compilation time)

With SnoopCompile I've noticed LinearMaps invalidates the cat methods and OneHotArrays extends the cat methods:

julia> using SnoopCompileCore

julia> invalidations = @snoopr begin
           using LinearMaps
       end
174-element Vector{Any}:
   MethodInstance for Core.kwcall(::@NamedTuple{dims::Val{1}}, ::typeof(cat), ::Vector{Base.PkgId}, ::Vararg{Any})
   "invalidate_mt_cache"
   ...

This patch tweaks the cat method extension by dispatching on _cat instead of cat, and somehow it does work:

# also a new Julia process
# on this PR branch
julia> @time using LinearMaps
  0.012467 seconds (11.94 k allocations: 741.312 KiB)

julia> @time using OneHotArrays # 🎉 back to normal
  0.576323 seconds (539.17 k allocations: 36.551 MiB, 4.37% compilation time)

P.S. I figure this is related to https://github.com/JuliaLang/julia/pull/45028 (because the Core.kwcall hint in the invalidations list) but I don't know why this works. Maybe @aviatesk or @timholy knows the magic behind?

Commenting out the Base.cat method in above link in OneHotArrays also reduces the latency, but I figure it's better to do it here since OneHotArrays looks more innocent while this packages code generates many cat methods in a seemingly wild way. FWIW, if we reduce the number for k in 1:8 to something like for k in 1:4 the latency won't be that severe, too. As an experiment:

diff --git a/src/blockmap.jl b/src/blockmap.jl
index 72c2bb9..49ba024 100644
--- a/src/blockmap.jl
+++ b/src/blockmap.jl
@@ -497,7 +497,7 @@ BlockDiagonalMap(maps::LinearMap...) =

 # since the below methods are more specific than the Base method,
 # they would redefine Base/SparseArrays behavior
-for k in 1:8 # is 8 sufficient?
+for k in 1:16 # is 8 sufficient?
     Is = ntuple(n->:($(Symbol(:A, n))::AbstractVecOrMatOrQ), Val(k-1))
     # yields (:A1, :A2, :A3, ..., :A(k-1))
     L = :($(Symbol(:A, k))::LinearMap)

This makes using OneHotArrays takes forever.

Sep 26 '23 16:09 johnnychen94

Codecov Report

Attention: Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 99.43%. Comparing base (c7a3c56) to head (0956320).

:exclamation: Current head 0956320 differs from pull request most recent head da9eded

Please upload reports for the commit da9eded to get more accurate results.

Files	Patch %	Lines
src/blockmap.jl	50.00%	4 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #216      +/-   ##
==========================================
+ Coverage   99.30%   99.43%   +0.12%     
==========================================
  Files          22       22              
  Lines        1591     1595       +4     
==========================================
+ Hits         1580     1586       +6     
+ Misses         11        9       -2

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 26 '23 16:09 codecov[bot]

First I was afraid this would bring back #123 (see #124 for the offending method signature), but this works since you keep the method signature as is. Unfortunately, on my machine, this PR increases loading time from 0.05 seconds to about 7 seconds on v1.9!

Sep 26 '23 18:09 dkarrasch

Looks like this isn't the right fix. I'll try to see if there's any better solution then.

Sep 27 '23 19:09 johnnychen94

Interestingly, this seems to work well for Julia v1.10+. Unfortunately, load time of OneHotArrays.jl doubles from v1.10 to v1.11 and master even without any interference by LinearMaps.jl. But this PR increases load time of LinearMaps.jl by just a little bit, and decreases load time of OneHotArrays.jl after LinearMaps.jl dramatically (factor 8, roughly).

Jun 27 '24 19:06 dkarrasch

I like this, and I think we should push this even further: ride the generic call chain and catch somewhere much deeper, where there is almost no method competition. Doesn't work yet, but pushed to continue working on it from work tomorrow.

Jun 27 '24 21:06 dkarrasch