Use LinAlg to speed up particle swarm
In my case I saw a x4 speedup for a function of 530 variables.
Codecov Report
Merging #993 (e2f9b9f) into master (d5cb5da) will decrease coverage by
0.02%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #993 +/- ##
==========================================
- Coverage 85.34% 85.31% -0.03%
==========================================
Files 42 42
Lines 3187 3181 -6
==========================================
- Hits 2720 2714 -6
Misses 467 467
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...ultivariate/solvers/zeroth_order/particle_swarm.jl | 99.23% <100.00%> (-0.02%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us.
Note that there is no regression in coverage, the PR is just 2 lines of code less.
@pkofod gentle ping. :) could you please review this PR? It should provide a nice performance improvement for Particle Swarm.
I will have a look!
Is it possible to see your example or is it not easy to share?
Unfortunately, the original problem would be a bit cumbersome to share.
But I think it should be easily reproducible with any problem that have >100 parameters.
In my case the objective function was something like abs2(C - X * Y), where C is the fixed N x M matrix, and X is N x 1 and Y is 1 x M parameter vectors, N = 20, M = 500.
I benchmarked another implementation aswell and yours seems solid. I do think that I might want to do a version where the arrays are preallocated. It doesn't affect the speed for large problems but the allocations can cause latency on their own.
function dist1(Tx, n_particles, n, X, d, cacheV2, XtX, i_best)
XtX = X'X
XtX_tr = LinearAlgebra.tr(XtX)
d = sum(XtX, dims=1)
@inbounds for i in eachindex(d)
d[i] = sqrt(max(n_particles * XtX[i, i] + XtX_tr - 2 * d[i], Tx(0.0)))
end
d[i_best], extrema(d)...
end
function dist2(Tx, n_particles, n, X, d, cacheV2, XtX, i_best)
mul!(XtX, X', X)
XtX_tr = LinearAlgebra.tr(XtX)
sum!(d, XtX)
for i in eachindex(d)
@inbounds d[i] = sqrt(max(n_particles * XtX[i, i] + XtX_tr - 2 * d[i], Tx(0.0)))
end
d[i_best], extrema(d)...
end
N = 1:600:9000
M = N
times = zeros(length(N), length(M))
for n = N
for m = M
m > n && continue
@info "new case, (n,m) = ($n, $m)"
X = randn(n, m)
BigM = zeros(m, m)
BigV = zeros(m)
BigV2 = zeros(m)
i_best = rand(1:m)
Base.GC.gc()
@time dd1 = dist1(Float64, m, n, X, BigV, BigV2, BigM, i_best)
@time dd2 = dist2(Float64, m, n, X, BigV, BigV2, BigM, i_best)
end
end
@pkofod Yes, preallocation should help for bigger problems. For my PR I wanted just the minimal changes that speedup linear algebra. I can try preallocation too, if Optim.jl has preallocation-enabled algorithms that I could use as a reference. Alternatively, this PR could be merged first, and then you can replace it with the preallocation version.
.
yes, just wanted to share my observations.
Sorry it took so long. Thanks though. The old version was very inefficient compared to your contribution.