BandedMatrices.jl icon indicating copy to clipboard operation
BandedMatrices.jl copied to clipboard

Precompile basic algebra operations

Open jishnub opened this issue 2 years ago • 3 comments

This reduces TTFX to some extent (although more seems possible) On master

$ julia --project -e '@time using BandedMatrices'
  2.627211 seconds (5.68 M allocations: 537.898 MiB, 14.21% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
  2.427783 seconds (6.22 M allocations: 288.556 MiB, 5.32% gc time, 100.00% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
  2.431238 seconds (6.22 M allocations: 288.609 MiB, 5.19% gc time, 100.00% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
  2.510519 seconds (6.00 M allocations: 277.845 MiB, 5.47% gc time, 99.97% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
  0.872704 seconds (3.74 M allocations: 199.381 MiB, 10.27% gc time, 99.91% compilation time)

This PR (with Julia v1.8.3)

$ julia --project -e '@time using BandedMatrices'
  3.126086 seconds (6.30 M allocations: 609.719 MiB, 3.40% gc time, 11.85% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
  1.704401 seconds (200.38 k allocations: 7.104 MiB, 99.99% compilation time: 100% of which was recompilation)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
  1.700849 seconds (200.35 k allocations: 7.102 MiB, 99.99% compilation time: 100% of which was recompilation)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
  1.697413 seconds (234.84 k allocations: 7.367 MiB, 99.96% compilation time)

$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v' 
  0.172603 seconds (15.92 k allocations: 577.234 KiB, 99.38% compilation time)

The biggest gain comes in the matrix-vector multiplication. The recompilation in the addition and subtraction needs to be looked into, as it may be possible to improve this performance further.

On Julia v1.9.0-alpha1

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
  1.774516 seconds (162.87 k allocations: 5.040 MiB, 99.99% compilation time)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
  1.867737 seconds (162.88 k allocations: 5.042 MiB, 99.99% compilation time)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
  1.821298 seconds (249.45 k allocations: 7.726 MiB, 99.96% compilation time)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
  0.180030 seconds (17.09 k allocations: 603.562 KiB, 99.52% compilation time)

So, the recompilation goes away, but the performance is similar.

jishnub avatar Dec 10 '22 13:12 jishnub

Codecov Report

Base: 80.55% // Head: 80.60% // Increases project coverage by +0.05% :tada:

Coverage data is based on head (e3eb264) compared to base (dfbf44f). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #283      +/-   ##
==========================================
+ Coverage   80.55%   80.60%   +0.05%     
==========================================
  Files          23       24       +1     
  Lines        3270     3279       +9     
==========================================
+ Hits         2634     2643       +9     
  Misses        636      636              
Impacted Files Coverage Δ
src/BandedMatrices.jl 100.00% <ø> (ø)
src/precompile.jl 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Dec 10 '22 13:12 codecov[bot]

I'm not sure this is worth it: there's added complexity by adding another dependency and the speedups are not that significant.

dlfivefifty avatar Dec 10 '22 22:12 dlfivefifty

I think there might be lower-hanging gains elsewhere, but we should consider this on a longer term. I'd say that gains that we see here are significant, as the time to load the package increases by less than the decrease in TTFX, so even a single operation is faster overall.

On master:

$ time julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); B + B'
julia --project -e   6.35s user 0.97s system 97% cpu 7.543 total

This PR

$ time julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); B + B'
julia --project -e   5.94s user 1.05s system 97% cpu 7.186 total

jishnub avatar Dec 11 '22 06:12 jishnub

Update: With SnoopPrecompile v1.0.3 and on Julia version v1.9.0-beta2, I obtain

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
  0.000116 seconds (63 allocations: 3.578 KiB)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
  0.000131 seconds (67 allocations: 3.812 KiB)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
  0.000821 seconds (238 allocations: 13.359 KiB)

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
  0.507999 seconds (1.38 M allocations: 87.291 MiB, 7.02% gc time, 188.76% compilation time)

The first few cases seem completely addressed, while the matrix-vector case seems only slightly improved.

jishnub avatar Jan 12 '23 07:01 jishnub

With #293 merged, the matrix-vector multiplication is precompiled as well.

$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
  0.000682 seconds (237 allocations: 13.375 KiB)

jishnub avatar Jan 26 '23 11:01 jishnub