nlopt
                                
                                 nlopt copied to clipboard
                                
                                    nlopt copied to clipboard
                            
                            
                            
                        Discriminating inner and outer function calls for MMA
I've been using the MMA algorithm and so far it converges (albeit slowly) to a local minimum for my optimization problem. However, I have some questions about the implementation that should allow me to accelerate the run time.
- 
I can calculate arbitrarily accurate gradients for my function of N parameters (and I have verified with finite differences that my gradients are correct), however, calculating the gradients takes ~N times as long as just calculating the objective value. The MMA paper in the documentation is quoted as "Each new inner iteration requires function values, but no derivatives, calculated at the optimal solution of the most recent subproblem." So is there a way to discriminate between inner function calls and outer function calls? I have observed that in SLSQP sometimes the objective function is called with a gradient vector of length 0. I assume that those are "inner calls," and can execute them more quickly. However, the MMA algorithm does not do this. Is there a solution here? 
- 
When I log the objective function minimization and observe how it converges with iteration I see that the objective does not decrease monotonically. As far as I can tell from the original MMA paper, this should not be the case. However, there are periods of monotonic convergence interspersed by periods of non-montonicity. Because I can't discriminate inner vs. outer function calls I can't tell whether the non-monotonicity is simply due to internal function calls, in which case I wouldn't have a problem. Are inner calls guaranteed to also be monotonically decreasing for MMA? 
Also, are there any suggestions for how many parameters are feasible for SQSLP? MMA says 10^4-10^5.
Thanks!
In principle, yes — we could update the code that evaluates f in the inner iterations to pass NULL for the gradient (indicating that it isn't needed).   It would then need to call f again to compute the gradient if the inner iteration succeeds.
I can calculate arbitrarily accurate gradients for my function of N parameters (and I have verified with finite differences that my gradients are correct), however, calculating the gradients takes ~N times as long as just calculating the objective value.
Note that this means you should probably re-think how you compute the gradients. Using adjoint methods, it should always be possible to compute the gradient with time proportional to that of computing the objective value once.
When I log the objective function minimization and observe how it converges with iteration I see that the objective does not decrease monotonically. As far as I can tell from the original MMA paper, this should not be the case.
Only on outer iterations should it decrease monotonically. Inner iterations are not guaranteed to be decreasing for MMA.
Great! Thanks. This confirmed my thinking.
Note that this means you should probably re-think how you compute the gradients. Using adjoint methods, it should always be possible to compute the gradient with time proportional to that of computing the objective value once.
Thanks for the reference! There is definitely room to optimize calculating gradients for this problem.The calculations of the gradients is trivially parallelized. If I have M parameters and M cores then it takes the same amount of time to calculate the gradients as it does the objective value once. Anyhoo, I'll take a look at adjoint methods but it isn't a top priority right now since I can scale up to a larger machine.
Only on outer iterations should it decrease monotonically. Inner iterations are not guaranteed to be decreasing for MMA.
This is was my takeaway from the original document. Using this assumption (and looking at the source code) I've made a workaround. Basically, my objective function tracks the previous function evaluations and if the input x vector returns a function evaluation larger than one seen before, it returns only the objective value and doesn't update the gradient. This is quite crude and I would prefer a more reliable method.
I can make a pull request to pass NULL for the gradients and remove the code that updates the gradient vector on inner iterations. Would that work?
I can make a pull request to pass NULL for the gradients and remove the code that updates the gradient vector on inner iterations. Would that work?
Yes, that would work. (You will have to add code to recompute the gradients elsewhere, as noted above.)
@oskooi we might want to look into implementing this as it could save quite a bit of time (by eliminating adjoint solves during the inner iterations).