PyTorch-LBFGS How to avoid"Line search failed"?

I'm trying FullBatchLBFGS with wolfe line search on a fitting task of a small dataset.

Levenbert-Marquardt is giving me fairly accuracy. Can I expect LBFGS provide similar accuracy as LM? Say, for a synthesized dataset, y=1/x
I'm getting lots of "Line search failed; curvature pair update skipped". How could I avoid this?
I noticed that the result is stochastic. Where does randomness come from since I'm using a full batch training (no shuffling).

Thanks!

Aug 14 '19 20:08 duhd1993

Hi,

Thanks for trying out the implementation! To answer your questions:

Levenberg-Marquardt is a different optimization algorithm. If your least squares problem is non-convex (which likely is the case if the response variable is nonlinear with respect to the predictor variables), then L-BFGS may converge to a different minimizer than LM. That being said, the minimizer may not be worse than the one gotten by LM, it really depends on the problem.
If the line search is failing, I would first try increasing the number of trial points. To do this, within the options, set max_ls to a higher number, i.e. options = {'max_ls': 30}, for example (or potentially higher). Let me know if this helps.
The result will be influenced by numerical error if in-place operations are used (which require less memory). This is turned on by default; I would suggest turning it off by modifying the option 'inplace' and setting it to False.

Please let me know if this helps.

Aug 15 '19 00:08 hjmshi

Thanks a lot for your reply! It appears fullbatch with wolfe line search works for me.

I'm having trouble getting comparable accuracy with Levenberg-Marquardt. The MSE is 3 magnitude smaller with LM. Could you give it a try if you had time: https://colab.research.google.com/drive/1vlzyCZKvaaZuWRQ5j8AT1pkq0pBTk18J
Thanks for the suggestions. It helps somehow. But Armijo line search is not working for me, it's not learning at all. I'm not sure what's wrong. All other settings is the same as Wolfe line search.
Thanks for pointing out. Actually, adding some randomness turns out to be a good thing for synthesized dataset without noises. When 'inplace' set to false, it converges to a local minimum pretty fast.

Aug 15 '19 20:08 duhd1993

Glad to hear the suggestions help!

I haven't had the chance to take a closer look at your problem, but based on some of the initial settings, have you tried using an initial learning rate/steplength of 1? This is typical in (quasi-)Newton based methods.

There a few possible reasons as to why Levenberg-Marquardt may work better. If I may ask, what LM optimizer are you trying? One possibility is that LM converges to a different, potentially better minimizer. This could occur because you have a non-convex problem; note that not all local minima are equal. Another possibility is that the LM optimizer you are using operates in double precision. This L-BFGS implementation for PyTorch operates only in single precision, which may restrict the amount of accuracy you may be able to attain for your problem.

I can look into the Armijo line search. It's not clear to me why it would not work, although the Wolfe line search should be much more effective in the non-convex setting as it allows you to "move forward" to ensure the step you're taking is not too small and you're seeing sufficient change in curvature.

Aug 19 '19 17:08 hjmshi