theseus icon indicating copy to clipboard operation
theseus copied to clipboard

Jacobian computation for AutoDiffCostFunction

Open akashsharma02 opened this issue 2 years ago • 4 comments

Thanks for the great library! This looks like a great effort to unify factor graph solvers with autograd and end-to-end parameter learning.

I am using this library, perhaps in a non-traditional case, where I do not run an outer optimization loop to learn parameters. I have a traditional factor graph set up with a "neural" factor, where I have a neural network in the cost function, which produces a loss value. In theory, I should be able to compute the jacobians for such a factor, using the AutoDiffCostFunction, since the network is differentiable. I have also verified the jacobian computation independently to work appropriately.

However, when I try to use such a factor in a simple optimization (theseus_layer.forward()), I consistently get an Out of Memory error, even with an RTX 3090.

Hypothesis: Since the intended setting is to run the Theseus layer as the inner optimization loop, I believe that the optimize call, retains the computation graph through all the iterations of the optimizer for backward passes. This might cause a blow-up in memory quickly. Is there a way to turn this off?

Some details:

  • I have a PyTorch neural network, which takes in two 256-D vectors as the input and produces an image. I'm trying to obtain optimize/smooth for these vectors using GN or LM optimizers, apart from other SE3 poses (constrained with RelativePoseError Factors).

  • I have tried different Backward modes, especially truncated with even 1 iteration.

Any help is appreciated!

akashsharma02 avatar Mar 18 '22 15:03 akashsharma02

Hi @akashsharma02, thanks a lot for your interest and kind words!

My first suggestion was going to be to try implicit or truncated modes, but it looks like you have already tried this. In principle, this should help because most of the graph will be detached, so the compute graph will only retain a few epochs. However, since you already tried this there must be some other issue.

If I understand your explanation correctly, your network parameters are optimization variables for the factor graph, is this correct? If so, that may have a significant memory cost when computing the Jacobian matrix, although two 256-D vectors doesn't sound too large. Even so, have you tried to see if you get similar errors with the CholmodSparseSolver?

Also, would it be possible for you to submit a PR with a small working example that reproduces this error? That would help us understand your use case better and offer better support.

luisenp avatar Mar 18 '22 17:03 luisenp

Hi @akashsharma02 following up on this issue. We can close if this is resolved?

mhmukadam avatar Apr 14 '22 01:04 mhmukadam

Hi @akashsharma02. I was curious if you were still working on this and if you have tried our newer versions that have support for vmap.

luisenp avatar Nov 09 '22 13:11 luisenp

@luisenp Apologies, I haven't been actively working on this for a while now. But I will try your suggestion with a new version of the library, and update the results here.

akashsharma02 avatar Nov 09 '22 21:11 akashsharma02