BandedMatrices.jl
BandedMatrices.jl copied to clipboard
Precompile basic algebra operations
This reduces TTFX to some extent (although more seems possible) On master
$ julia --project -e '@time using BandedMatrices'
2.627211 seconds (5.68 M allocations: 537.898 MiB, 14.21% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
2.427783 seconds (6.22 M allocations: 288.556 MiB, 5.32% gc time, 100.00% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
2.431238 seconds (6.22 M allocations: 288.609 MiB, 5.19% gc time, 100.00% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
2.510519 seconds (6.00 M allocations: 277.845 MiB, 5.47% gc time, 99.97% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
0.872704 seconds (3.74 M allocations: 199.381 MiB, 10.27% gc time, 99.91% compilation time)
This PR (with Julia v1.8.3)
$ julia --project -e '@time using BandedMatrices'
3.126086 seconds (6.30 M allocations: 609.719 MiB, 3.40% gc time, 11.85% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
1.704401 seconds (200.38 k allocations: 7.104 MiB, 99.99% compilation time: 100% of which was recompilation)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
1.700849 seconds (200.35 k allocations: 7.102 MiB, 99.99% compilation time: 100% of which was recompilation)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
1.697413 seconds (234.84 k allocations: 7.367 MiB, 99.96% compilation time)
$ julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
0.172603 seconds (15.92 k allocations: 577.234 KiB, 99.38% compilation time)
The biggest gain comes in the matrix-vector multiplication. The recompilation in the addition and subtraction needs to be looked into, as it may be possible to improve this performance further.
On Julia v1.9.0-alpha1
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
1.774516 seconds (162.87 k allocations: 5.040 MiB, 99.99% compilation time)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
1.867737 seconds (162.88 k allocations: 5.042 MiB, 99.99% compilation time)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
1.821298 seconds (249.45 k allocations: 7.726 MiB, 99.96% compilation time)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
0.180030 seconds (17.09 k allocations: 603.562 KiB, 99.52% compilation time)
So, the recompilation goes away, but the performance is similar.
Codecov Report
Base: 80.55% // Head: 80.60% // Increases project coverage by +0.05% :tada:
Coverage data is based on head (
e3eb264) compared to base (dfbf44f). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## master #283 +/- ##
==========================================
+ Coverage 80.55% 80.60% +0.05%
==========================================
Files 23 24 +1
Lines 3270 3279 +9
==========================================
+ Hits 2634 2643 +9
Misses 636 636
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/BandedMatrices.jl | 100.00% <ø> (ø) |
|
| src/precompile.jl | 100.00% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I'm not sure this is worth it: there's added complexity by adding another dependency and the speedups are not that significant.
I think there might be lower-hanging gains elsewhere, but we should consider this on a longer term. I'd say that gains that we see here are significant, as the time to load the package increases by less than the decrease in TTFX, so even a single operation is faster overall.
On master:
$ time julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); B + B'
julia --project -e 6.35s user 0.97s system 97% cpu 7.543 total
This PR
$ time julia --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); B + B'
julia --project -e 5.94s user 1.05s system 97% cpu 7.186 total
Update: With SnoopPrecompile v1.0.3 and on Julia version v1.9.0-beta2, I obtain
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B + B'
0.000116 seconds (63 allocations: 3.578 KiB)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B - B'
0.000131 seconds (67 allocations: 3.812 KiB)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); @time B * B'
0.000821 seconds (238 allocations: 13.359 KiB)
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
0.507999 seconds (1.38 M allocations: 87.291 MiB, 7.02% gc time, 188.76% compilation time)
The first few cases seem completely addressed, while the matrix-vector case seems only slightly improved.
With #293 merged, the matrix-vector multiplication is precompiled as well.
$ julia-1.9 --project -e 'using BandedMatrices; B = BandedMatrix(0=>rand(10)); v = rand(10); @time B * v'
0.000682 seconds (237 allocations: 13.375 KiB)