nlopt icon indicating copy to clipboard operation
nlopt copied to clipboard

Discriminating inner and outer function calls for MMA

Open pkairys opened this issue 4 years ago • 4 comments

I've been using the MMA algorithm and so far it converges (albeit slowly) to a local minimum for my optimization problem. However, I have some questions about the implementation that should allow me to accelerate the run time.

  1. I can calculate arbitrarily accurate gradients for my function of N parameters (and I have verified with finite differences that my gradients are correct), however, calculating the gradients takes ~N times as long as just calculating the objective value. The MMA paper in the documentation is quoted as "Each new inner iteration requires function values, but no derivatives, calculated at the optimal solution of the most recent subproblem." So is there a way to discriminate between inner function calls and outer function calls? I have observed that in SLSQP sometimes the objective function is called with a gradient vector of length 0. I assume that those are "inner calls," and can execute them more quickly. However, the MMA algorithm does not do this. Is there a solution here?

  2. When I log the objective function minimization and observe how it converges with iteration I see that the objective does not decrease monotonically. As far as I can tell from the original MMA paper, this should not be the case. However, there are periods of monotonic convergence interspersed by periods of non-montonicity. Because I can't discriminate inner vs. outer function calls I can't tell whether the non-monotonicity is simply due to internal function calls, in which case I wouldn't have a problem. Are inner calls guaranteed to also be monotonically decreasing for MMA?

Also, are there any suggestions for how many parameters are feasible for SQSLP? MMA says 10^4-10^5.

Thanks!

pkairys avatar Jan 13 '21 19:01 pkairys

In principle, yes — we could update the code that evaluates f in the inner iterations to pass NULL for the gradient (indicating that it isn't needed). It would then need to call f again to compute the gradient if the inner iteration succeeds.

I can calculate arbitrarily accurate gradients for my function of N parameters (and I have verified with finite differences that my gradients are correct), however, calculating the gradients takes ~N times as long as just calculating the objective value.

Note that this means you should probably re-think how you compute the gradients. Using adjoint methods, it should always be possible to compute the gradient with time proportional to that of computing the objective value once.

When I log the objective function minimization and observe how it converges with iteration I see that the objective does not decrease monotonically. As far as I can tell from the original MMA paper, this should not be the case.

Only on outer iterations should it decrease monotonically. Inner iterations are not guaranteed to be decreasing for MMA.

stevengj avatar Jan 14 '21 20:01 stevengj

Great! Thanks. This confirmed my thinking.

Note that this means you should probably re-think how you compute the gradients. Using adjoint methods, it should always be possible to compute the gradient with time proportional to that of computing the objective value once.

Thanks for the reference! There is definitely room to optimize calculating gradients for this problem.The calculations of the gradients is trivially parallelized. If I have M parameters and M cores then it takes the same amount of time to calculate the gradients as it does the objective value once. Anyhoo, I'll take a look at adjoint methods but it isn't a top priority right now since I can scale up to a larger machine.

Only on outer iterations should it decrease monotonically. Inner iterations are not guaranteed to be decreasing for MMA.

This is was my takeaway from the original document. Using this assumption (and looking at the source code) I've made a workaround. Basically, my objective function tracks the previous function evaluations and if the input x vector returns a function evaluation larger than one seen before, it returns only the objective value and doesn't update the gradient. This is quite crude and I would prefer a more reliable method.

I can make a pull request to pass NULL for the gradients and remove the code that updates the gradient vector on inner iterations. Would that work?

pkairys avatar Jan 14 '21 20:01 pkairys

I can make a pull request to pass NULL for the gradients and remove the code that updates the gradient vector on inner iterations. Would that work?

Yes, that would work. (You will have to add code to recompute the gradients elsewhere, as noted above.)

stevengj avatar Jan 14 '21 20:01 stevengj

@oskooi we might want to look into implementing this as it could save quite a bit of time (by eliminating adjoint solves during the inner iterations).

smartalecH avatar Nov 04 '21 00:11 smartalecH