Convex.jl icon indicating copy to clipboard operation
Convex.jl copied to clipboard

Proof of concept more-MOIified implementation

Open ericphanson opened this issue 4 years ago • 10 comments

This is a proof of concept implementation of a different internal structure for Convex.jl (edit: which has started to turn into a bit of a rewrite).

Currently, just the minimal amount of methods are implemented to solve testproblem/testproblem.jl. Probably any other problem will error! The end-user access point is solve2! which works just like solve! (except much of it isn't implemented).

Some design notes

template basically has the same motivation as conic_form!: it recurses through the problem, generating a conic reformulation of it. It does so in a different way, however. It takes a context parameter which acts very similarly to how unique_conic_forms currently works by holding problem-local state. In this case, it holds a MOI model. When constructing the conic reformulation, any Convex AbstractVariables are associated to MOI variables added to the model. Any constraints are simply immediately added to the model. Then a MOI function is returned. In the end, we recover an MOI function which is the objective function for the problem.

This implementation reuses the current atoms and DCP properties, but the conic reformulations are entirely rewritten (template instead of conic_form!) which means that all the objects in conic_form.jl and solution.jl are not used.

Some more notes:

  • We get access to both the Convex.jl object (and it's children), as well as the MOI objects returned from calling template on the children. We should think about when to use one versus the other.
  • It's crucial that a redesign allows defining atoms in terms of partially specified problems (https://github.com/jump-dev/Convex.jl/issues/310) (but this isn't done here yet! edit: actually this isn't a big deal and can be done as easily in the new implementation).
  • I've used the convention of not including the ! when modifying the context object, under the assumption that it is often being modified.
  • It might be quite interesting to multithread problem formulation. To do so, we could use the branching tree structure of the problem. One issue is that adding variables to the MOI model is a global operation we only want to do once.
  • Convex.jl currently passes around tuples of real and imaginary parts of objects. I did not do this here, partly in the hopes that MOI can get native complex support and we can re-use it here. This I imagine would be key for Hypatia to not need the big complex-to-real SDP reformulation used in Convex.jl (assuming it can natively support the complex PSD cone).

On the problem in testproblem/testproblem.jl, we get much improved results:

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 42.958454 seconds (77.45 M allocations: 13.555 GiB, 6.44% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 13.723807 seconds (19.08 M allocations: 2.313 GiB, 1.29% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)
 14.912025 seconds (41.56 M allocations: 2.344 GiB, 2.98% gc time)

It should also be much easier to support other cones to support e.g. geomean for vectors (ref #388).

To do in order to think about merging this

  • [x] Figure out how to handle complex variables (answer: introduce separate ComplexVariable, ComplexConstant, and ComplexTape types)
  • [x] support satisfy
  • [ ] duals
  • [ ] warmstarts
  • [x] Think about the vexities for problems, or revert those changes
  • [ ] decide if the expression tree caching is useful and re-add it if so
  • [ ] look at perf impact of not using an atom for dot product (esp with fusion); maybe revert
  • [ ] update all atoms (so far: affine, linear, and exp are done, except for duals and the dot product issue)
  • [ ] update all constraints (so far LtConstraint, GtConstraint, EqConstraint, SDPConstraint, ExponentialCone)
  • [ ] pass tests
  • [ ] write devdocs describing how Convex works (now that I really know!)

New features that this enables

  • [x] #388
  • [x] Lower logdet to the MOI cone, enabling Hypatia to use the native cone and MOI bridges otherwise
  • [ ] ...?

ericphanson avatar Jul 31 '20 17:07 ericphanson

Codecov Report

Merging #393 into master will decrease coverage by 4.38%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #393      +/-   ##
==========================================
- Coverage   88.42%   84.04%   -4.39%     
==========================================
  Files          75       77       +2     
  Lines        3508     3691     +183     
==========================================
  Hits         3102     3102              
- Misses        406      589     +183     
Impacted Files Coverage Δ
src/Convex.jl 100.00% <ø> (ø)
src/VectorAffineFunctionAsMatrix.jl 0.00% <0.00%> (ø)
src/atoms/affine/add_subtract.jl 79.59% <0.00%> (-13.27%) :arrow_down:
src/atoms/affine/multiply_divide.jl 59.82% <0.00%> (-21.57%) :arrow_down:
src/atoms/affine/reshape.jl 72.72% <0.00%> (-7.28%) :arrow_down:
src/atoms/affine/sum.jl 80.00% <0.00%> (-10.91%) :arrow_down:
src/atoms/lp_cone/abs.jl 72.00% <0.00%> (-22.74%) :arrow_down:
src/atoms/second_order_cone/norm2.jl 57.69% <0.00%> (-42.31%) :arrow_down:
src/constant.jl 94.59% <0.00%> (-5.41%) :arrow_down:
src/constraints/constraints.jl 80.43% <0.00%> (-6.63%) :arrow_down:
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dd2f6c9...f4a5e0a. Read the comment docs.

codecov[bot] avatar Jul 31 '20 18:07 codecov[bot]

In this PR, I introduce a structure

struct VectorAffineFunctionAsMatrix{M,B,V}
    matrix::M
    vector::B
    variables::V
end

to represent an MOI.VectorAffineFunction in a way more conducive to composing affine functions of the variables. Why is this important? The extended formulations that Convex does can be seen as applying a sequence of affine functions to the variables, and adding constraints to the model. The variables are represented by a symbolic vector x (a vector of MOI.VariableIndex), and the affine functions are therefore of the form Ax + b where A is a matrix and b is a vector. By keeping A as a matrix, we can use specialized sparse matrix * sparse matrix methods, or BLAS for dense matrices, etc. Note that a priori we cannot avoid the cubic complexity of matmuls because our vector is symbolic (i.e. ideally we would just do a sequence of matvecs, doing A*x to get a vector, then applying the next transformation to that vector, and so on).

However, the final matrix should be 1 x n dimensional, since we end up with a scalar objective function. Therefore, we should perform the matmuls in the other direction: we have, forgetting about the additive part, things like y^T * D * C * B * A * x, and we should do y^T * D to recover an 1 x n dimensional matrix again, then (y^T * D) * C), and so forth. (Maybe we should then transpose everything to do the multiplications if there aren't specialized dispatches for 1xn instead of nx1?).

ericphanson avatar Aug 01 '20 16:08 ericphanson

I've pushed a commit that implements the ideas from the previous comment (and then one that fixes perf bugs), which improves the benchmarks quite a bit:

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with classic `Convex.solve!`...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
 15.400637 seconds (1.40 M allocations: 10.077 GiB, 14.19% gc time)
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
  2.386677 seconds (121.63 k allocations: 785.039 MiB, 6.23% gc time)
┌ Info: Running with JuMP...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
  5.013906 seconds (15.27 M allocations: 1.047 GiB, 3.48% gc time)

and

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (2, 300, 300, 5)
 16.237871 seconds (148.79 k allocations: 4.973 GiB, 5.62% gc time)
┌ Info: Running with JuMP...
└   (MAX_ITERS, m, n, k) = (2, 300, 300, 5)
 29.201845 seconds (55.55 M allocations: 3.898 GiB, 8.59% gc time)

I could even solve it with n=m=500, for which JuMP and the usual Convex OOM on my desktop with 16 gb of ram:

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (1, 500, 500, 5)
 43.958516 seconds (133.30 k allocations: 10.646 GiB, 4.62% gc time)

(note that I set MAX_ITERS to 1 for that last one, for the sake of time).

One thing to note is that the performance of this approach is sensitive to missing specialized dispatches.

ericphanson avatar Aug 01 '20 18:08 ericphanson

To achieve type stability with this approach (or most others), we will need to make atoms parametric on the number of dimensions, because we often have different implementations for vectors vs scalars vs matrices. Here, the implementation leads to a choice of AffineOperation and thus often a different output type. This must not depend on runtime values, in order to have type stability.

One other thought is that the VAFTape I introduce here could give the compiler problems if they get too long, since they are implemented as tuples, but we could collapse them down whenever they exceed length 15, say. This could even be type stable because the length of a tuple is not a runtime value.

ericphanson avatar Aug 02 '20 10:08 ericphanson

The latest commit adds the unregistered dependency https://github.com/ericphanson/MatrixChainMultiply.jl to look at the impact of using that method for choosing how to evaluate the VAFTape objects. In the test problem it doesn't really make a difference, but maybe in other problems it might. (CI will start failing now due to the dependency, which is fine because I might start stripping out the old functionality here anyway).

ericphanson avatar Aug 02 '20 14:08 ericphanson

I'm starting to see how this can all work together. This rewrite is actually not very different than how Convex.jl works now, but I think will actually end up a lot simpler. The idea is the following (note this is mostly for myself to put into the dev docs at some point..):

There are two "levels" at which Convex operates: the high-level that users use, and the MOI/VAFTape level. At the high-level, we have atoms and constraint objects which provide dispatches for usual functions (like *, == , +, norm, etc) to generate a Convex.jl object (an atom or a constraint).

  • The function template(obj, context::Context) lowers an atom to the "MOI level", returning its MOI-level-representation. At the "MOI level", all variables and values are vectorized (instead of being matrices or scalars), and each object is turned into either a VAFTape (or a SparseVAFTape, depending if USE_SPARSE() == true), or a vector of numbers, and the MOI model living in the context object may be updated (say, by adding a constraint).
  • The function add_constraint_to_context(constr, context::Context) adds MOI constraints to the model living in context (possibly adding variables and doing other operations to the model as well), returning nothing.

As a design point, I will try to do as much as possible at the high-level, since this makes the package most accessible to contributions from Convex.jl users, and simplifies the codebase. The functions many atoms implement do not actually need to be represented by atoms; e.g. I've already removed the DotMultiplicationAtom and the PartialTransposeAtom, in favor of just writing out the transformation in Convex.jl's high level syntax. We only need to implement a small set of primitives (+, -, *, reshape, and some constraints, and possibly some others) in order to recover most of the functionality.

In order to make use of more MOI features, however, (e.g. another cone like for vector geomean), a Convex.jl atom needs to be created, and a template dispatch added.

A note on SparseVAFTape vs VAFTape: these are two different ways of lazily representing a sequence of affine transformations to a vector of variables (and all of Convex.jl's reformulations are such transformations + adding constraints). SparseVAFTape represents each transformation by a single sparse matrix, which is nice in that we have the exact same type for every transformation, and the perf model is pretty simple to understand (this model is copied from CVXPY). VAFTape tries to be fancier and use structured types like UniformScaling and zero vectors that do not allocate, and separately stores a matrix multiplication and vector addition component. This has the disadvantage that each operation is possibly of a different type, so they are stored as a tuple, and type stability is more complicated. However, it might provide better performance in some situations. My plan for now is to implement both (they should only need a limited set of primitive methods) and benchmark to decide which to keep (or possibly keep both).

ericphanson avatar Aug 03 '20 18:08 ericphanson

I’m gonna leave this for now; I had it timeboxed to a week and couldn’t push it through but I think the idea works (the idea is very simple, mostly following the existing structure; just recurse through the expression tree, lazily building the affine transformations to the variables and eagerly adding constraints to the model. That being said this is the ~4th reimplementation I’ve written code for, so somehow it wasn’t obvious to me from the start). What’s left is just more implementation and some fixes (there’s some test failures that indicate I’ve made some mistakes), and cleanup to remove all the old code. Hopefully I or someone else can pick this up at some point soon! Also, I have tried out four slightly different ways of holding the tape of affine operations; I think SparseTape2 is the best so far (hold a sparse matrix A and dense vector b to represent Ax + b).

ericphanson avatar Aug 09 '20 23:08 ericphanson

Just had a stray thought for how to do the hashing for deeply nested trees that repeat themselves. Currently we can have hash collisions (#78) and the hashing step can be a performance bottleneck (there is a discourse post somewhere). In this PR I removed it / didn’t implement it for the new way because of these issues. But I think we could do something simple like every atom is a mutable struct (so they are semantically distinguishable) and we cache their templates via an IdDict (already doing this for constraints to populate the duals). Then if you have a problem with this structure in which a bunch of subtrees are reused, the user just has to reuse the individual atoms and the cache will be hit. It requires the user to explicitly do this, but I think that’s ok because it avoids the spectre of incorrectness via hash collisions and the performance penalty of hashing everything just to help for this edge case. And I think requiring the user to reuse the atom in order to reuse the template is reasonable and transparent.

ericphanson avatar Aug 23 '20 03:08 ericphanson

I haven't looked in detail at what is happening here yet but I saw you mentioned complex variables in the initial post, and there now exists https://github.com/jump-dev/ComplexOptInterface.jl. @ericphanson have you had any recent thoughts about this PR?

chriscoey avatar May 02 '21 11:05 chriscoey

https://github.com/jump-dev/ComplexOptInterface.jl

I did see that, but it was/is still pretty early for that effort. My thought in this PR was to just continue supporting complex variables for the current atoms by reformulating as we already doing but that future work might be able to involve that.

@ericphanson have you had any recent thoughts about this PR?

No recent ones, but to summarize my thoughts on this:

  • We have one interface point with MOI which is when we load the model. We should make an MOI model at the start and allow individual atoms to add constraints directly to the model as is done in this PR. This would allow using more general cones.
  • Convex's current system of using ordered dictionaries makes it hard to have type stability unless we have a uniform output type (e.g. "everything is a spare array" like it seems CVXPY does), and even then that might not be the right structure because...
  • ...We deal with lots of expressions that are essentially A*B*C*D*E*F*x after reformulation, where x is a vector of variables. In the objective, A is 1 x n (to get a scalar objective out), which means we should probably multiply these left-to-right. The current implementation makes this is a bit hidden but if I remember correctly, I think we essentially are multiplying right-to-left, i.e. starting with F, the deepest bit of the expression tree, and building upward to A. For many models it will be better to store a tape as we build the expression tree and once we have all the expressions decide the order to evaluate it.
  • this touches on the fact that Convex represents everything as vectors/matrices but MOI currently uses scalar variables and variable types. This mismatch likely causes performance issues, but we can avoid those by only scalarizing at the end, making use of as many optimized matmuls as we can before then (e.g. BCDEF). Though these operations are often sparse and the product of sparse matrices may not be, which adds its own difficulties
  • for hashing I think https://github.com/jump-dev/Convex.jl/pull/393#issuecomment-678726117 is probably the way to go, since we mostly use mutable structs anyway. (this relates to the previous point, that in this paradigm you want vector/matrix operations, not scalar ones, since you need to allocate for everything you do but that's usually negligible if it doesn't scale with problem size).
  • In general, I think it would be great to have more MOI frontends. E.g. "operator-overloading" vs "macro DSL" is a completely orthogonal choice AFAIK to "convex + extended formulations" vs "only direct formulations", which itself seems largely orthogonal to vectorized operations vs scalar. But out of all the combinations, I think currently you can only choose either (macro DSL, direct formulations, scalar operations), i.e. JuMP, or (operator overloading, extended formulations, vectorized operations), i.e. Convex. (There's actually another choice, Parametron.jl, which treats a more narrow set of problems in a zero-allocation way which is pretty cool). Perhaps some kind of compositional arrangement would let users efficiently choose the combination that fits their problem/domain.

Also, unfortunately I probably won't be working much on this stuff for the forseeable future, but I'm happy to share thoughts / knowledge of convex internals with anyone interested in working on it.

ericphanson avatar May 02 '21 14:05 ericphanson

closing in favor of https://github.com/jump-dev/Convex.jl/pull/504

ericphanson avatar Jul 06 '23 23:07 ericphanson