trajopt
trajopt copied to clipboard
Computation Graph Solvers
First of all, please forgive my ignorance on this subject. I came across the paper A New Architecture for Optimization Modeling Frameworks and the associated presentation, and they seemed to point to something I had played around with in my mind for a while. Would it be possible to use a deep learning framework like Tensorflow as a solver? The idea would be that we could write some sort of solver interface that would convert costs/cnts to a format that could be solved in Tensorflow.
It would seem in my mind that there would be a few advantages to something like that
- You would get automatic GPU support
- You would get access to gradient computation for all sorts of functions
- You would (I would think) be able to add things like an image processing neural network as a cost.
- Perhaps this would open the door for allowing our framework to work with learning based approaches
I'm imagining there is probably a very good/obvious reason why this hasn't been done and QP solvers are a better idea. So that is my question to people more knowledgeable in optimization. Why should we not look into this?
@arocchi You seem to be our resident optimization expert. Any thoughts?
I personally like the computation graph idea. I took a look at the paper you mentioned, and basically what they suggest is creating both the model AND the solver using computation graphs.
For reference, what packages like Casadi do is just the first part, where the model is implemented in a computation graph for fast computation of gradients (i.e. automatic differentiation).
The big difference from Tensorflow / ML specific frameworks and Casadi is that the former typically only include first order methods, i.e. gradient descent, while Casadi uses a computation graph for automatic differentiation and then uses the gradients coming from free from the CG to feed more sophisticated optimizers (e.g. second order methods, and methods which allow us to specify hard constraints such as joint limits) as well as a set of sophisticated integrators which are not necessarily useful if we are stuck with kinematic planning, but can become increasingly interesting whenever we want to do dynamic planning, e.g. taking into account velocities and acceleration/torque limits.
The limit of the method presented in the paper IMHO is that at the moment there are no solvers implemented for non-linear optimization problems. Moreover it looks like the Tensorflow implementation of SCS (which, from the TrajOpt perspective, is the QP solver) struggles with respect to the CPU implementation for sparse problems (in our case, the planning problems have a very definite structure).
I think it is a very interesting idea nonetheless, but at the moment not mature enough. You can see all kind of crossovers popping up in this space, (e.g. Casadi-Tensorflow, Tensorflow-Scipy.Optimize), but I personally thing this idea is still not mature to be used on a project such as TrajOpt.
On the other hand, using things like Tensorflow (or Casadi) for the model makes total sense to me, as we would obtain all derivatives for free. Tensorflow looks like the ultimate AD framework for the purpose of the project :)
@mpowelson So I think I found a way to use the c++ API within tensorflow. It looks like you can leverage c++ the way we want but you have to build from within Tensorflow project shown here. This is not ideal so I did some searching and found a individual that created a cmake wrapper to produce the Tensorflow C++ API as a shared library which solves this issue. It looks like this may eventually get integrated at some point but given that it is cmake we can build it directly in our workspace similar to how I build bullet and fcl.
I am currently building Tensorflow C++ API and will test an example. Maybe getting closer to a C++ example using Tensorflow.
@Levi-Armstrong I saw that. I guess I didn't fully understand what that meant, but I'm glad you're on top of it.
To address some of the other questions we had, it looks like the graph is formed when you create a "Session"(or maybe ClientSession if using C++). Also, I think I had a misconception on how the gradients work. They are doing automatic differentiation not symbolic. So there really isn't any "calculation" of the gradient expression. Once you form the graph each node has a gradient function associated with it, and TF knows how to calculated those. As Alessio said above, the derivatives are just another thing you get for free.
It looks like PyTorch was added to Google Cloud Recently. Blog Post
Here is another library that looks promising that also supports eigen types. https://autodiff.github.io/tutorials/
Yeah that looks nice. One negative that I see is that it is maintained by only one guy at ETH Zurich. I wonder if it will be around in a few years.
This got me poking around at this again. I saw this repo with some examples using tensorflow and IPOPT. I still think that would be pretty neat if we could represent the entire optimization graph in a TF graph.
This project based on PyTorch may also be of interest. They implemented a batch QP solver (primal-dual interior-point method) which may be good enough for the use case of trajOpt.
Their batch mode support and ability to compute the gradients of the problem parameters are attractive for learning-based methods.
This project based on PyTorch may also be of interest. They implemented a batch QP solver (primal-dual interior-point method) which may be good enough for the use case of trajOpt.
Their batch mode support and ability to compute the gradients of the problem parameters are attractive for learning-based methods.
So @Levi-Armstrong and I just finished up a brief project looking into rewriting TrajOpt using PyTorch. We also briefly looked at Tensorflow but went with PyTorch because of the better C++ support. The big idea was what was described in this issue, that by writing the terms in PyTorch we would get GPU support for free as well as automatic differentiation. While the data is not (at least currently) public, long story short it seemed like it didn't work well from a performance standpoint.
While it certainly could have been our implementation, it seemed that the matrices in TrajOpt terms were just too small to make parallelization internal to a term worthwhile (See some rudimentary benchmarks HERE). That is why we stuck with Eigen and restructured around IFOPT. The thought was that parallelization of the problem (i.e. calculating each cost in parallel) was more important than the parallelization of each term - particulary in cases like collision where a relatively expensive term is being calculated using an external library.
That said, I still think this would be interesting as a sort of blue sky approach. I've wondered if we could cleverly segment the problem some way to utilize batching and maybe optimize using multiple seeds at once.
Thanks for linking your benchmark code. I agree with the judgment call to go with IFOPT. For dense matrix operations, I think there will be very little speed gain from using Pytorch or Tensorflow.
I also think there's some research worthiness for having problem-level parallelization, and I'm interested in pursuing these directions myself. An example I really like is:
The software requirement is to simultaneously solve, say 500 trajOpt problems in batch and get the resulting trajectories quickly. Personally, I think the tesseract framework suits this use case well and I'd be happy to do some experimental extensions.
@Levi-Armstrong can this be closed?