Optim.jl icon indicating copy to clipboard operation
Optim.jl copied to clipboard

Use LinAlg to speed up particle swarm

Open alyst opened this issue 3 years ago • 6 comments

In my case I saw a x4 speedup for a function of 530 variables.

alyst avatar Jun 24 '22 22:06 alyst

Codecov Report

Merging #993 (e2f9b9f) into master (d5cb5da) will decrease coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #993      +/-   ##
==========================================
- Coverage   85.34%   85.31%   -0.03%     
==========================================
  Files          42       42              
  Lines        3187     3181       -6     
==========================================
- Hits         2720     2714       -6     
  Misses        467      467              
Impacted Files Coverage Δ
...ultivariate/solvers/zeroth_order/particle_swarm.jl 99.23% <100.00%> (-0.02%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us.

codecov[bot] avatar Jun 24 '22 23:06 codecov[bot]

Note that there is no regression in coverage, the PR is just 2 lines of code less.

alyst avatar Jun 24 '22 23:06 alyst

@pkofod gentle ping. :) could you please review this PR? It should provide a nice performance improvement for Particle Swarm.

alyst avatar Jul 11 '22 17:07 alyst

I will have a look!

pkofod avatar Jul 23 '22 10:07 pkofod

Is it possible to see your example or is it not easy to share?

pkofod avatar Jul 23 '22 10:07 pkofod

Unfortunately, the original problem would be a bit cumbersome to share. But I think it should be easily reproducible with any problem that have >100 parameters. In my case the objective function was something like abs2(C - X * Y), where C is the fixed N x M matrix, and X is N x 1 and Y is 1 x M parameter vectors, N = 20, M = 500.

alyst avatar Jul 24 '22 07:07 alyst

I benchmarked another implementation aswell and yours seems solid. I do think that I might want to do a version where the arrays are preallocated. It doesn't affect the speed for large problems but the allocations can cause latency on their own.

function dist1(Tx, n_particles, n, X, d, cacheV2, XtX, i_best)
    XtX = X'X
    XtX_tr = LinearAlgebra.tr(XtX)
    d = sum(XtX, dims=1)
    @inbounds for i in eachindex(d)
        d[i] = sqrt(max(n_particles * XtX[i, i] + XtX_tr - 2 * d[i], Tx(0.0)))
    end
    d[i_best], extrema(d)...
end
function dist2(Tx, n_particles, n, X, d, cacheV2, XtX, i_best)
    mul!(XtX, X', X)
    XtX_tr = LinearAlgebra.tr(XtX)
    sum!(d, XtX)
    for i in eachindex(d)
        @inbounds d[i] = sqrt(max(n_particles * XtX[i, i] + XtX_tr - 2 * d[i], Tx(0.0)))
    end
    d[i_best], extrema(d)...
end

N = 1:600:9000
M = N
times = zeros(length(N), length(M))
for n = N
    for m = M
        m > n && continue
    @info "new case, (n,m) = ($n, $m)"
    X = randn(n, m)
    BigM = zeros(m, m)
    BigV = zeros(m)
    BigV2 = zeros(m)
    i_best = rand(1:m)

    Base.GC.gc()
   @time dd1 = dist1(Float64, m, n, X, BigV, BigV2, BigM, i_best)
   @time dd2 = dist2(Float64, m, n, X, BigV, BigV2, BigM, i_best)
    end
end

pkofod avatar Sep 04 '22 14:09 pkofod

@pkofod Yes, preallocation should help for bigger problems. For my PR I wanted just the minimal changes that speedup linear algebra. I can try preallocation too, if Optim.jl has preallocation-enabled algorithms that I could use as a reference. Alternatively, this PR could be merged first, and then you can replace it with the preallocation version.

alyst avatar Sep 07 '22 01:09 alyst

.

yes, just wanted to share my observations.

pkofod avatar Sep 07 '22 10:09 pkofod

Sorry it took so long. Thanks though. The old version was very inefficient compared to your contribution.

pkofod avatar Sep 11 '22 11:09 pkofod