Possible performance improvement?
following an initially-unrelated discussion with @andrewrosemberg
I think there might be room for (substantial 🤔🤞) performance improvements in the NonLinearProgram code.
Note: what I write below might apply to other classes of problems, I only checked the nonlinear code so far.
TLDR: I think we can replace {a lot of linear system solves} with {a single linear system solve} during forward/reverse diff
Medium-long explanation:
Forward/reverse diff compute Jacobian-vector products of the form J*v or J'*v, and the Jacobian J is of the form K\N, where K, N are matrices. Currently, to obtain w=J*v, we solve KJ = N (matrix RHS), then compute J*v. I believe we can instead compute w = Jv by solving Kw = (Nv), which requires a single linear system solve (with vector RHS).
Why it matters for the ACOPF example I work with, on a 300-bus case with loads as parameters, K has size 13k x 13k, and N is 13k x 200. Replacing K\N with K\(Nv) potentially yields a 200x reduction in memory and time.
Complete thought process and stack traversing. I'm focusing on forward diff for the sake of example, same remarks apply to reverse diff
- Forward diff returns primal/dual sensitivities in a
ForwCachestructure here: https://github.com/jump-dev/DiffOpt.jl/blob/df4a7dcee059ee048ff620590ff5478d8b2c79cc/src/NonLinearProgram/NonLinearProgram.jl#L534-L537 - These are obtained from a left multiplication with Jacobian here https://github.com/jump-dev/DiffOpt.jl/blob/df4a7dcee059ee048ff620590ff5478d8b2c79cc/src/NonLinearProgram/NonLinearProgram.jl#L530-L532
- Said Jacobian
Δsis computed by the_compute_sensitivityfunction. Stepping into that call, said jacobian is obtained from_compute_derivatives_no_relaxhttps://github.com/jump-dev/DiffOpt.jl/blob/df4a7dcee059ee048ff620590ff5478d8b2c79cc/src/NonLinearProgram/nlp_utilities.jl#L457 - which then points us to a linear system solve https://github.com/jump-dev/DiffOpt.jl/blob/df4a7dcee059ee048ff620590ff5478d8b2c79cc/src/NonLinearProgram/nlp_utilities.jl#L425-L427
The alternative implementation I propose would cache the factorization of K (which seems to already be the case), and lazily compute Jacobian-vector products w=J*v by solving w = K\(Nv). The math would be similar (I think) for Jacobian-transpose-vector products
Potential counter-arguments (and counter-counters) to the above:
- If one wants to compute/retrieve the entire Jacobian --> I don't see the full Jacobian being stored/exposed at the moment; this functionality would have to be exposed first
- if one wants to compute large numbers of Jacobian-Vector products, it may be more efficient to form the full Jacobian
J, then do matrix-vector products. --> Note that ifK(and its factorization) are sparse, it might still be more efficient to computeJv = K \ (Nv)(sparse matvec + sparse solve) rather than a dense matvec product.
Good point @mtanneau . I missed that we are using K \ N in DiffOpt. Indeed, we should use K \ (N v), which is more memory savvy. The factorization of K should be kept in the cache.
If we want to compute the full Jacobian, it would be better to leverage a linear solver that supports solving linear system with sparse right-hand-side (like Pardiso). As far as I know, I don't think we need evaluating the full Jacobian in our current applications.
if one wants to compute large numbers of Jacobian-Vector products, it may be more efficient to form the full Jacobian
J, then do matrix-vector products.
I think the code was written with that in mind. In view of your comment, we should indeed probably do what you suggest at least by default. We could have an option to keep the old behavior in case we have a use case in which it is faster (probably a case where the matrix of constraints is rather dense and we make a lot of Jacobian-Vector products...
I agree, we dont want the full Jacobian. We should apply the suggested changes.
I should be able to PR the changes, I’ll likely have questions regarding the overall architecture/ potential changes to existing APIs