Convex.jl example of slow model generation

I was playing around with implementing a basic linear SVM in both Convex.jl and JuMP and noticed a pretty significant penalty for model generation in Convex.jl:

using JuMP
using Distributions
using Gurobi
import Convex # namespace conflicts with JuMP.Variable()

function gen_data(n)
    pos = rand(MvNormal([1.0,2.0],1.0),n)
    neg = rand(MvNormal([-1.0,1.0],1.0),n)
    return pos,neg
end

const N = 2
const C = 10
function svm_jump(pos,neg, solver)

    m = Model(solver=solver)

    @defVar(m, w[1:N])
    @defVar(m, b)
    @defVar(m, ξpos[1:size(pos,2)] >= 0)
    @defVar(m, ξneg[1:size(neg,2)] >= 0)

    @setObjective(m, Min, sum{w[i]^2, i = 1:N} + C*sum(ξpos) + C*sum(ξneg))
    @addConstraint(m, posconstr[j=1:size(pos,2)], dot(w,pos[:,j]) - b >= 1 - ξpos[j])
    @addConstraint(m, negconstr[j=1:size(neg,2)], -1*(dot(w,neg[:,j]) - b) >= 1 - ξneg[j])

    status = solve(m)
    @assert status == :Optimal
    return getValue(w[:]), getValue(b)
end

function svm_convex(pos,neg,solver)
    w = Convex.Variable(N)
    b = Convex.Variable()
    ξpos = Convex.Variable(size(pos,2), Convex.Positive())
    ξneg = Convex.Variable(size(neg,2), Convex.Positive())

    problem = Convex.minimize(Convex.sum_squares(w) + C*sum(ξpos) + C*sum(ξneg))
    for j in 1:size(pos,2)
        push!(problem.constraints, dot(w,pos[:,j]) - b >= 1 - ξpos[j])
        #problem.constraints += dot(w,pos[:,j]) - b >= 1 - ξpos[j]
    end
    for j in 1:size(neg,2)
        push!(problem.constraints, -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j])
        #problem.constraints += -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j]
    end
    Convex.solve!(problem, solver)
    return Convex.evaluate(w), Convex.evaluate(b)
end

pos,neg = gen_data(10)
# initial compilation
svm_jump(pos,neg, GurobiSolver(OutputFlag=0))
svm_convex(pos,neg, GurobiSolver(OutputFlag=0))

pos,neg = gen_data(2000)
@time w, b = svm_jump(pos,neg, GurobiSolver())
@show w,b
@time w, b = svm_convex(pos,neg, GurobiSolver())
@show w,b

Both give approximately the same answer. For JuMP on my machine, Gurobi solves in 0.09 seconds and the whole thing is done in 0.10 seconds. For Convex.jl, Gurobi solves in 0.11 seconds but the total time is 6.5 seconds.

I know model generation time hasn't been as much of a priority as with JuMP, and Convex.jl has to do more work in principle since it's performing a lot of automatic transformations. Anyway I wanted to share this as a benchmark that could be used to speed up Convex.jl.

CC @IainNZ @joehuchette

Jan 15 '15 22:01 mlubin

This is an interesting test case. It's a perfect example illustrating how Convex.jl is optimized for vectorized syntax, while JuMP requires indexing. In Convex.jl, you can reformulate it as an unconstrained, vectorized problem, with only w and b as variables, which makes it much faster (as well as more readable, to my eye):

    function svm_convex(pos,neg,solver)
        w = Convex.Variable(N)
        b = Convex.Variable()
        problem = Convex.minimize(Convex.sum_squares(w) + C*Convex.max(1+b-w'*pos, 0) + C*Convex.max(1-b+w'*neg, 0))
        Convex.solve!(problem, solver)
        return Convex.evaluate(w), Convex.evaluate(b)
    end

Convex.jl parses this in just .3 seconds (with the same .11 second solve time, although I'm using ECOS instead of Gurobi).

(To make it even more readable, I'd replace Convex.max(1+b-w'*pos, 0) with Convex.hinge_loss(b-w'*pos).)

The reason you'd want to use Convex.jl over JuMP is not for speed, but so you can write nice things like this. Though of course, if we can make it faster as well that would be ideal :)

Jan 16 '15 14:01 madeleineudell

Hah thats pretty neat

Jan 16 '15 18:01 IainNZ

I agree that's a nicer formulation. I'm hoping that there's a way to get the best of both worlds where you would choose to use a vectorized formulation because it's clearer and not because it's needed for speed (which is very un-julian). Anyway even without major architectural changes I'd bet that there a few bottlenecks in the scalar case that could be improved without too much effort.

Jan 16 '15 18:01 mlubin

The most terrible thing that we're doing in the scalar case is performing indexing using sparse matrix multiplication. So if you do

a = Variable(10) a[2] + a[3] == 1

we construct a sparse matrix which when multiplied by a will select out the 2nd entry; then we do the same for the 3rd entry; then we add these sparse matrices. You can see why this gets extremely slow. It would be much faster if we used dense matrices, but the memory gets silly and large. Even worse are operations like

a = Variable(10,20) (a')[4,1] == 1

which forms the sparse matrix which when multiplied by vec(a) would produce vec(a'); then mutliplies that sparse matrix by the sparse matrix that selects out the [4,1] entry of a matrix of that size. If we were clever we would never form all the rows of the sparse matrix representing the transpose operator, since we only end up using one of them.

I'm sure that we could use some insights from JuMP here...

On Fri, Jan 16, 2015 at 1:40 PM, Miles Lubin [email protected] wrote:

I agree that's a nicer formulation. I'm hoping that there's a way to get the best of both worlds where you would choose to use a vectorized formulation because it's clearer and not because it's needed for speed (which is very un-julian). Anyway even without major architectural changes I'd bet that there a few bottlenecks in the scalar case that could be improved without too much effort.

— Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/49#issuecomment-70300509.

Madeleine Udell PhD Candidate in Computational and Mathematical Engineering Stanford University www.stanford.edu/~udell

Jan 16 '15 18:01 madeleineudell

Most users won't know the "best way to do things", and indexing along with stacking are probably our slowest operations- so we definitely need to work on it.

@davidlizeng did some clever things to make indexing faster for LLS and I believe he's looking into adding that in Convex.jl. We'll see how much faster that makes this example.

Jan 16 '15 18:01 karanveerm

Are you using SparseMatrixCSC for everything? If so, you could try using triplet format which is much more efficient for summing two matrices (all you do is concatenate). You could also delay generation of the transpose until it's used somewhere. Just tossing ideas out, not sure if they'll be useful since I don't really know the internal structure of Convex.

Jan 17 '15 00:01 mlubin

Expressions in Convex.jl are maps from variables objects to coefficients, which are SparseMatrixCSC. To index into the expression, we do one of the following

select certain rows of the sparse coefficients using [] to directly select the rows we want
multiply the sparse coefficient by another sparse matrix that has the same effect of selecting the rows we want.

If we are selecting lots of rows all over the sparse matrix, it seems like method 2 is usually better, while selecting few rows, or a chunk of rows in a row usually leads to method 1 being better.

@mlubin @IainNZ Are there other ways that we can use to try to select rows out of SparseMatrixCSC objects that you think might be faster?

Jan 17 '15 08:01 davidlizeng

SparseMatrixCSC is efficient for selecting columns but not rows. If this is the bottleneck operation then I'd recommend using using CSR format (compressed sparse row). Row slices are linear-time lookup. There's an open PR for this (JuliaLang/julia#7029), but if you just want to experiment, CSR is equivalent to CSC with the dimensions flipped.

Jan 17 '15 14:01 mlubin

For the original convex post, I now get

julia> using Distributions

julia> using LinearAlgebra

julia> using Gurobi

julia> import Convex # namespace conflicts with JuMP.Variable()

julia> function gen_data(n)
           pos = rand(MvNormal([1.0,2.0],1.0),n)
           neg = rand(MvNormal([-1.0,1.0],1.0),n)
           return pos,neg
       end
gen_data (generic function with 1 method)

julia> const N = 2
2

julia> const C = 10
10

julia> function svm_convex(pos,neg,solver)
           w = Convex.Variable(N)
           b = Convex.Variable()
           ξpos = Convex.Variable(size(pos,2), Convex.Positive())
           ξneg = Convex.Variable(size(neg,2), Convex.Positive())

           problem = Convex.minimize(Convex.sumsquares(w) + C*sum(ξpos) + C*sum(ξneg))
           for j in 1:size(pos,2)
               push!(problem.constraints, dot(w,pos[:,j]) - b >= 1 - ξpos[j])
               #problem.constraints += dot(w,pos[:,j]) - b >= 1 - ξpos[j]
           end
           for j in 1:size(neg,2)
               push!(problem.constraints, -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j])
               #problem.constraints += -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j]
           end
           Convex.solve!(problem, solver)
           return Convex.evaluate(w), Convex.evaluate(b)
       end
svm_convex (generic function with 1 method)

julia> pos,neg = gen_data(2000)
([1.9549836788159067 1.7171091461123682 … 0.0857034301415448 2.217107664293599; 0.9142782414570454 3.391383891929138 … 2.150346162128055 3.0846956783486137], [-2.3747735768967484 -0.9632429519263247 … -1.9632219657972412 -1.1641815583798856; 0.5275107809723425 1.4908871156989485 … 0.5244168621220339 1.5478250107179292])

julia> @time w, b = svm_convex(pos,neg, Gurobi.Optimizer)
Academic license - for non-commercial use only - expires 2022-04-15
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (mac64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 8008 rows, 4012 columns and 24014 nonzeros
Model fingerprint: 0x7e0bd539
Model has 2 quadratic constraints
Coefficient statistics:
  Matrix range     [4e-05, 1e+01]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 4002 rows and 1 columns
Presolve time: 0.01s
Presolved: 4006 rows, 4011 columns, 16012 nonzeros
Presolved model has 2 second-order cone constraints
Ordering time: 0.00s

Barrier statistics:
 Dense cols : 3
 Free vars  : 3
 AA' NZ     : 1.201e+04
 Factor NZ  : 1.604e+04 (roughly 3 MBytes of memory)
 Factor Ops : 6.421e+04 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   2.63936156e+05  1.99810454e+04  2.60e+00 2.01e+04  4.58e+01     0s
   1   9.88964802e+04  1.50954405e+04  8.39e-01 1.18e+04  1.92e+01     0s
   2   7.05989167e+04  1.31270445e+04  5.37e-01 8.15e+03  1.32e+01     0s
   3   1.87214697e+04  9.95205410e+03  1.60e-02 1.85e+03  2.36e+00     0s
   4   1.42635612e+04  1.04746351e+04  3.75e-03 6.88e+02  8.44e-01     0s
   5   1.35207428e+04  1.08936853e+04  2.28e-03 4.30e+02  5.35e-01     0s
   6   1.31176886e+04  1.11765792e+04  1.52e-03 3.04e+02  3.79e-01     0s
   7   1.28434221e+04  1.13534443e+04  1.05e-03 2.41e+02  2.89e-01     0s
   8   1.26922566e+04  1.14491159e+04  6.67e-04 2.10e+02  2.42e-01     0s
   9   1.24852995e+04  1.15827954e+04  1.85e-04 1.67e+02  1.78e-01     0s
  10   1.24222926e+04  1.16773917e+04  2.93e-06 1.39e+02  1.48e-01     0s
  11   1.23650014e+04  1.18138275e+04  2.06e-06 1.01e+02  1.07e-01     0s
  12   1.23479331e+04  1.18425480e+04  1.61e-06 9.31e+01  9.81e-02     0s
  13   1.23213208e+04  1.19401568e+04  6.73e-07 6.91e+01  7.31e-02     0s
  14   1.23100141e+04  1.19706912e+04  7.35e-07 6.31e+01  6.55e-02     0s
  15   1.22990185e+04  1.19938315e+04  6.03e-07 5.74e+01  5.89e-02     0s
  16   1.22883280e+04  1.20136702e+04  5.36e-07 5.24e+01  5.31e-02     0s
  17   1.22875077e+04  1.20302735e+04  2.69e-07 4.83e+01  4.96e-02     0s
  18   1.22798852e+04  1.20369287e+04  2.35e-07 4.63e+01  4.69e-02     0s
  19   1.22707587e+04  1.20705517e+04  2.19e-07 3.82e+01  3.84e-02     0s
  20   1.22663177e+04  1.21011822e+04  1.31e-07 3.10e+01  3.15e-02     0s
  21   1.22610991e+04  1.21091698e+04  2.17e-07 2.91e+01  2.92e-02     0s
  22   1.22532029e+04  1.21262368e+04  6.86e-08 2.49e+01  2.45e-02     0s
  23   1.22488727e+04  1.21461177e+04  2.03e-07 2.07e+01  1.99e-02     0s
  24   1.22462311e+04  1.21653762e+04  1.19e-07 1.67e+01  1.58e-02     0s
  25   1.22442693e+04  1.21733717e+04  1.31e-07 1.50e+01  1.40e-02     0s
  26   1.22431147e+04  1.21791383e+04  7.82e-08 1.36e+01  1.26e-02     0s
  27   1.22420458e+04  1.21867057e+04  9.59e-08 1.19e+01  1.09e-02     0s
  28   1.22403342e+04  1.21968617e+04  7.17e-08 9.47e+00  8.63e-03     0s
  29   1.22399947e+04  1.22074753e+04  6.60e-08 6.93e+00  6.40e-03     0s
  30   1.22393923e+04  1.22135922e+04  4.19e-08 5.65e+00  5.12e-03     0s
  31   1.22390998e+04  1.22204954e+04  2.83e-08 4.25e+00  3.75e-03     0s
  32   1.22388921e+04  1.22239764e+04  1.39e-07 3.39e+00  3.00e-03     0s
  33   1.22383519e+04  1.22266122e+04  6.80e-08 2.71e+00  2.37e-03     0s
  34   1.22380912e+04  1.22290447e+04  4.16e-08 2.07e+00  1.82e-03     0s
  35   1.22379909e+04  1.22332853e+04  4.39e-08 1.11e+00  9.59e-04     0s
  36   1.22377215e+04  1.22364936e+04  8.75e-09 2.98e-01  2.53e-04     0s
  37   1.22376906e+04  1.22376527e+04  2.66e-08 8.89e-03  7.71e-06     0s
  38   1.22376903e+04  1.22376637e+04  1.59e-09 6.10e-03  5.36e-06     0s
  39   1.22376896e+04  1.22376663e+04  3.99e-09 5.46e-03  4.73e-06     0s
  40   1.22376892e+04  1.22376881e+04  9.96e-09 4.44e-09  1.44e-07     0s

Barrier solved model in 40 iterations and 0.10 seconds
Optimal objective 1.22376892e+04


User-callback calls 123, time in user-callback 0.00 sec
  1.047763 seconds (3.16 M allocations: 677.923 MiB, 22.29% gc time, 6.96% compilation time)
([1.3915131521712125, 0.7011441389283813], 1.075629207371045)

So better than 6.5 seconds, but still much slower than the solve time of Gurobi.

Feb 23 '22 03:02 odow

Closing because this has been improved. It could probably still be better, but that's a much larger job.

Jan 08 '24 22:01 odow