cvxpylayers icon indicating copy to clipboard operation
cvxpylayers copied to clipboard

differentiate a vector valued decision variable wrt a vector valued parameter

Open haowwang opened this issue 4 years ago • 6 comments

Hi,

Thank you for the fantastic package, and it has been super helpful! I was trying to differentiate a vector valued decision variable wrt to a vector valued parameter, and I noticed that the gradient returned from cvxpylayer is the average of gradients of elements in the vector wrt to the parameter. Suppose x is a decision variable of size n by 1, and y is the parameter of size m by 1. dx / dy (using cvxpylayers) is a vector of size 1 by m. Is it possible that we can get the full Jacobian (of size n by m), instead of the average across gradients of each element of the decision variable? Thank you!

haowwang avatar May 04 '21 14:05 haowwang

@haowwang, thanks for trying out CVXPY Layers!

PyTorch is most efficient at computing gradients of scalar valued functions, so by default it disallows computing a full Jacobian.

To compute a Jacobian, you can pass a CvxpyLayer object to this function: https://pytorch.org/docs/master/generated/torch.autograd.functional.jacobian.html

Here is an example.

import cvxpy as cp
import torch
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])


torch.manual_seed(0)
A_tch = torch.randn(m, n, requires_grad=True)
b_tch = torch.randn(m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)
jac_A, jac_b = torch.autograd.functional.jacobian(cvxpylayer, (A_tch, b_tch)),

akshayka avatar May 04 '21 15:05 akshayka

Hi @akshayka Thank you very much for the prompt response and the example! One quick question: if my optimization problem has multiple decision variables, do I still get to choose which one I want to differentiate? When I was using cvxpy layers as a layer in the network, I can choose to differentiate any scalar valued decision variable by calling backward() on those variables. Is it possible we can do something similar here?

haowwang avatar May 04 '21 15:05 haowwang

For something like that, you might need to write some custom code. It will be equivalent to solving the problem, getting the optimal value of the variable you are interested in, multiplying it (dot product) by a standard basis vector, and then calling backward(), for each coordinate of the variable.

The context in this thread might (?) help: https://discuss.pytorch.org/t/how-to-compute-jacobian-matrix-in-pytorch/14968/14

An alternative is to adapt this code (calling backward only on the variable you care about): https://gist.github.com/sbarratt/37356c46ad1350d4c30aefbd488a4faa

This alternative might be faster, since it is batched (though our batched solve method has room for improvement, so you might not see as significant an improvement as you might like).

I suppose you should be aware that in every case (including my previous answer), PyTorch will underneath the hood solve the problem many times, so you'll definitely see some overhead.

akshayka avatar May 04 '21 15:05 akshayka

Hi @akshayka thank you for the suggestions! For the pseudocode below, when I call backward on the vector valued decision variable var, is pytorch calling backward on every element of var and then taking the average of all those gradients under the hood? Thank you!

cvxpylayer = CvxpyLayer(problem=prob, parameters=param_list, variables=var_list) 
var = cvxpylayer(param) # param is of size n by 1 and var is of size m by 1
var.backward(torch.ones(var.size())) # gets gradient of size 1 by m instead of Jacobian of size n by m

haowwang avatar May 04 '21 16:05 haowwang

Not quite. It computes the gradient times the all ones vector.

PyTorch never materializes full derivative matrices, it only implements the (transpose) of the matrix vector product for derivatives. This makes computing gradients of scalar valued functions very efficient, but it is also why it cannot easily obtain a Jacobian for you.

You might look at the PyTorch documentation to better understand how their automatic differentiation tools work, and what they support.

Hope that helps.

akshayka avatar May 04 '21 18:05 akshayka

Hi @akshayka Thank you very much for your help! I will look into that a bit more!

haowwang avatar May 05 '21 20:05 haowwang