cvxpy icon indicating copy to clipboard operation
cvxpy copied to clipboard

Slow compilation of standard form LP (time increases linearly in number of variables)

Open akshayka opened this issue 3 years ago • 12 comments

Compilation time of a standard form LP (minimize c @ x, subject to x >= 0) increases linearly with the number of variables (for both OSQP and SCS).

The slowdown occurs regardless of whether c is sparse (indeed, empty) or dense.

To Reproduce

import time                                                                      
                                                                                 
import cvxpy as cp                                                               
import matplotlib.pyplot as plt                                                  
import numpy as np                                                               
import scipy.sparse as sparse                                                    
from tqdm.notebook import tqdm                                                   
                                                                                 
                                                                                 
def construct_lp(n):                                                             
    c = sparse.csr_matrix((n, 1))  # timings are similar if c is a dense numpy array
    x = cp.Variable(n)                                                           
    return cp.Problem(cp.Minimize(c.T @ x), [x >= 0])                            
                                                                                 
n_range = list(map(int, np.logspace(1, 8, 8)))                                   
solvers = {"OSQP": [], "SCS": []}                                                
for solver in solvers:                                                           
    compile_times = solvers[solver]                                              
    for n in tqdm(n_range):                                                      
        problem = construct_lp(n)                                                
        start = time.time()                                                      
        problem.get_problem_data(solver, verbose=True)                           
        end = time.time()                                                        
        print(end - start)                                                       
        compile_times.append(end - start)                                        
                                                                                 
for solver in solvers:                                                           
    compile_times = solvers[solver]                                              
    plt.plot(n_range, compile_times, label=solver, marker='o')                   
    plt.xscale('log')                                                            
    plt.yscale('log')                                                            
plt.legend()                                                                     
plt.show()  

Expected behavior Should compilation time really be linear in the number of variables? As a user, I would think that the data (c) should just be copied in. It seems strange that compilation is taking much longer than the time needed to generate the problem data (given that this problem is essentially in standard form).

More generally we should strive for standard form problems to compile quickly.

Output image x-axis is the size of the variable, y-axis is seconds taken to compile.

Version

  • CVXPY Version: 1.1.15

To get started with helping out We should profile this script to understand why compilation is slow.

akshayka avatar Oct 29 '21 20:10 akshayka

@akshayka I did a quick profile of the script, see below.

image

Here is the result file: lp.zip

It can be viewed interactively using:

pip install snakeviz
snakeviz lp.prof

phschiele avatar Oct 30 '21 13:10 phschiele

I looked into this, and there are a lot of copies happening. Unfortunately they're not so easy to remove. Basically cvxcore is a data structures nightmare, because we had to implement a very hacky representation of sparse tensors (among other issues). We could fix this particular case possibly with more hacks, but really we need a rewrite IMO with well constructed data structures.

SteveDiamond avatar Nov 04 '21 18:11 SteveDiamond

@akshayka I noticed this behaviour for QP problems as well, so it is not only LP problems. I had a chat with @phschiele and we were wondering if we could maybe rebuild cvxcore but using Python only. @SteveDiamond - is there a specific reason for why most of it is coded up in C++?

michaels0m avatar Nov 09 '21 17:11 michaels0m

@Michael-git96 the original cvxcore was in C++ because it was supposed to be called across several languages (Python, Matlab, Julia, etc...). But now the only reason it's in C++ is for speed. While the current C++ implementation isn't actually great at that, I'm doubtful that a Python implementation could be faster.

rileyjmurray avatar Nov 09 '21 17:11 rileyjmurray

@rileyjmurray thank you for the background, I will take a look at it over the next few days.

michaels0m avatar Nov 09 '21 17:11 michaels0m

While the current C++ implementation isn't actually great at that, I'm doubtful that a Python implementation could be faster

Actually, I think a Python version could be faster than the current C++ implementation, if it got the data structures right, and was designed better from an algorithms perspective.

Of course a good C++ will always be faster than a Python implementation, but right now we don't have a good C++ implementation. That said we do support multi-threaded canonicalization (if CVXPY is compiled with support for OPENMP), which would be hard to do in pure Python due to the GIL.

I had a chat with @phschiele and we were wondering if we could maybe rebuild cvxcore but using Python only. @SteveDiamond - is there a specific reason for why most of it is coded up in C++?

@Michael-git96, if you and @phschiele want to take a crack at re-imagining / re-writing cvxcore, that would be fantastic, and indeed I think the right thing to do would be to prototype it in Python (unless you're very comfortable with C++). It will be easy for me to port to C++ if necessary.

Let us know if you have any questions. cvxcore is not well implemented or documented, so don't worry if you find it difficult to understand ...

akshayka avatar Nov 09 '21 17:11 akshayka

By the way, I believe that improving CVXPY's canonicalization performance (both time and memory, with and without Parameters) is one of the most impactful things that we can do.

Fast canonicalization of standard forms is simply a must-have.

Fast canonicalization of problems with many parameters will enable many downstream applications of CVXPY --- like using highly parametrized CVXPY problems in real solvers (such as https://github.com/cvxgrp/osmm) and in differentiable pipelines (https://github.com/cvxgrp/cvxpylayers).

So: this is definitely a worthwhile endeavor.

akshayka avatar Nov 09 '21 17:11 akshayka

While I was looking into compilation times and cvxcore, I also noticed that if I ran the same QP problem but with different forms, the compilation time varied quite drastically. For example, I took the example code above and added the functions: 'def construct_qp(n):
x = cp.Variable(n) A = np.ones((10000, n)) b = np.ones((10000,)) P = A.T@A q = -A.T@b
return cp.Problem(cp.Minimize(0.5*cp.quad_form(x, P) + q@x), [x >= 0])

def construct_qp1(n): A = np.ones((10000, n)) x = cp.Variable(n) b = np.ones((10000,)) return cp.Problem(cp.Minimize(0.5*cp.sum_squares(A@x-b)), [x>= 0])'

The functions above describe the same Least-squares minimization but in different formats, construct_qp describes it as a pure QP (in standard form) while construct_qp1 describes the problem as a Least-squares problem. The compilation times for each are as follows: image

I am quite surprised by the difference in compilation times as well as the fact that construct_qp compilation time seems to increase exponentially (graph on the right above) while the other seems to follow a linear trend. This does seem to be linked to cvxcore via the matrix stuffing functions. I suspected that the standard form QP would compile faster but I would not have thought that their compilation times would have different shapes.

I just thought I would mention the above as I am looking at both LP and QP problems for now to keep it simple. If there are some reasonable explanations for why their shapes would be different then please do let me know. I am still looking into re-imagining/re-writing cvxcore.

michaels0m avatar Nov 16 '21 22:11 michaels0m

@Michael-git96 converting a problem with objective quad_form(x, P) into conic form requires factoring P as P = M.T @ M for a matrix M. We currently perform that factorization by symmetric eigendecomposition. Computing symmetric eigenvalue decomposition is expensive except for very particular matrices.

If you have objective quad_form(x, P) and use OSQP (or other solvers in reductions/solvers/qp_solvers) then there is no eigendecomposition. There is an analogous cost of computing the Gram matrix P = A.T @ A explicitly. The asymptotic cost of computing the Gram matrix and eigendecomposition of P is the same (for general matrices) but the constant factors differ significantly.

I'm surprised that the scaling seems linear with your construct_qp(n) function. I would expect at least quadratic scaling. I suspect that we see linear scaling here because A (as you construct it here) is rank 1, so eigendecomposition is very cheap. Try running the experiment again where A = np.random.rand((10000, n)).

With regards to scaling of construct_qp1(n): I would not say that looks exponential. To me, it looks like quadratic scaling. I'll also note that cvxpy's canonicalization of construct_qp1(n) for OSQP is faster than construct_qp(n) for OSQP. Try plotting on the same axes.

rileyjmurray avatar Nov 17 '21 05:11 rileyjmurray

@rileyjmurray, I reran the code with A = np.random.rand((10000, n)) as suggested and the graphs don't seem to have changed significantly, the only difference is that construct_qp does take longer. I have plotted them on the same axis as suggested: image

You mention that converting a problem with objective quad_form(x, P) into conic form requires factoring P, but then wouldn't that mean that construct_qp1 should be faster than construct_qp since in least-squares form we already have a useful factorization (P = A.T@A)? From the graph above it does seem like the least-squares form is compiled faster but only for 4000 variables or more.

The trends seem to have remained the same as well and making the change to A does not seem to have made a difference (apart from for longer compilation times).

Edit: I wanted to run it for larger problems but my laptop does not have enough memory available since all the matrices are dense. Also, most of the time is being spent on Applying ConeMatrixStuffing or Applying QpMatrixStuffing as expected.

michaels0m avatar Nov 17 '21 14:11 michaels0m

QPs and non-QPs are canonicalized along slightly different code paths, so I wouldn't make any assumptions about which is faster on an absolute basis. @rileyjmurray's comment explains the big O scaling results though. The CVXOPT_qp and ECOS_qp are both factorizing the matrix P, resulting in quadratic scaling. All other plots are linear, as expected.

SteveDiamond avatar Nov 17 '21 18:11 SteveDiamond

@phschiele this currently has a 1.3 milestone. Do you want to make this as resolved, in light of the SciPy backend in CVXPY 1.3? If you want to keep this open then maybe we should copy the messages here into a new thread on the cvxpy/benchmarks repo.

rileyjmurray avatar Jan 16 '23 10:01 rileyjmurray

Closing this issue as the performance of the backend is much better understood at this point.

phschiele avatar Apr 27 '24 20:04 phschiele