pypose icon indicating copy to clipboard operation
pypose copied to clipboard

Examples of using differentiable least squares

Open zeroAska opened this issue 2 years ago • 17 comments

📚 The doc issue

In the provided examples, the least square problem optimizes over all the parameters. However, in some applications, parts of the parameters are from the neural network and should be optimized with SGD, while the others can be directly optimized by the least square solvers. In Theseus, this is specified by the "inner loop" and "outer loop". Does the current version of PyPose support this?

Suggest a potential alternative/fix

Provide an example that the state space is a neural network to learn and the pose is to be optimized by least square solvers.

zeroAska avatar Jul 03 '23 23:07 zeroAska

@zeroAska Yes, it is supported. You may do something like

opt1 = SGD(net1.parameters())
opt2 = LM(net2, strategy=strategy)
for i in range(epochs):
    opt1.step()
    for j in range(iterations):
        opt2.step(input)

Bi-level optimization like this will be directly supported in a future release.

wang-chen avatar Jul 03 '23 23:07 wang-chen

Thanks for the quick response. If net1 and net2 is the same nn.Module layer with different subset of parameters, is there a way to specify which subset of parameters is used for LM and SGD repectively?

zeroAska avatar Jul 03 '23 23:07 zeroAska

You can use net.module1.parameters() and net.module2 to achieve this.

wang-chen avatar Jul 03 '23 23:07 wang-chen

Thanks!! In the above net.module2's least square problem, there is a pose LieTensor in nn.Module.parameters whose initial value might need manual assignment for each problem and for each training example. An example of such application is the visual odometry where we will need to train the image encoder and perform least squares over the poses. How to specify the parameter's init value each time considering that it is a parameter in nn.Module?

Another question is that, if a batch of training data have different poses, are we able to multiple each pose with its corresponding data as a batch and launch different least square problems within a batch? For example, in a batch of 2, we have the pose batch [pose1, pose2] and we want to act on the batch of [image1, image2] to obtain [ pose1 @ image1, pose2 @ image2]

zeroAska avatar Jul 05 '23 18:07 zeroAska

For initialization, it has no difference from a neural network, you may perform in-place value assignment for module parameters, e.g. net.module.weight1.data.fill_(value) before solving the problem. More information is here.

For the second question, if you mean each time you want to activate different parameters for a LM problem to solve, PyPose currently doesn't directly support this because LM or GN doesn't work for stochastic inputs, as it doesn't use gradient descent, so it will not converge as solutions will jump far away from the last iteration. However, technically you can do it by defining different optimizers for different parameters.

wang-chen avatar Jul 05 '23 18:07 wang-chen

Many thanks!

zeroAska avatar Jul 05 '23 18:07 zeroAska

As a followup question, for the above outer-inner loop setup, as the prediction is coming from the least squares, how is its gradient w.r.t ground truth is propagated through the least square layer?

zeroAska avatar Jul 13 '23 01:07 zeroAska

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

wang-chen avatar Jul 13 '23 01:07 wang-chen

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

An easy way to do this is to perform one more model forward after inner optimization, then do outer optimization.

wang-chen avatar Jul 13 '23 01:07 wang-chen

Thanks for the paper link. I will check it out.

zeroAska avatar Jul 13 '23 06:07 zeroAska

In the provided paper above, does the bi-level optimization (or the inner/outer loop) share the same loss? If the two stages have different loss to optimize, can we still use the trick of keeping the last iteration's gradients? For example, the inner loop that optimizes the pose might have a label-free loss, while the outer loop that optimizes the network parameters might have a supervised loss.

zeroAska avatar Jul 18 '23 23:07 zeroAska

They don't have to have the same loss. Another example having the different loss functions is this paper.

wang-chen avatar Jul 19 '23 00:07 wang-chen

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

Neutronpanp avatar Aug 16 '23 13:08 Neutronpanp

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

After optimization, we suggest performing another forward operation for the loss, so that it can be backpropagated through the inner-level optimization with only one iteration. For example, in the MPC example: In Line 231, we don't retain gradient, but then in Line 293, we perform another round of LQR, which bypass the multiple iterations and saves the computing time.

wang-chen avatar Aug 16 '23 15:08 wang-chen

If the outer level loss is a supervised loss, does the outer level's gradient propagation method in the paper still hold?

zeroAska avatar Aug 16 '23 23:08 zeroAska

Yes, Supervised loss is an easier case.

pyposebot avatar Aug 16 '23 23:08 pyposebot

If the outer level loss is a supervised loss, does the outer level's gradient propagation method in the paper still hold? hi , did you understand how to propogate the gradient through the optimizer, i meet the same problem, i want to supervise the pose from LM

califford avatar Jun 12 '24 08:06 califford