optimistix icon indicating copy to clipboard operation
optimistix copied to clipboard

Can we use GaussNewton/LevenbergMarquardt on KL loss minimization

Open timnotavailable opened this issue 4 months ago • 3 comments

Hello,

suppose I have a small variational autoencoder VAE (under 200 paras) and I'm going to train it in a very time efficient way, so because the learning parameters are so few so that I plan to use second-order optimization to converge faster and more robust. I found it is impossible to invoke the "optimistix.minimise" API with optimistix.AbstractLeastSquaresSolver such as GaussNewton or LevenbergMarquardt , I'm about to ask whether it is possible and whether it is suitable to use GaussNewton/LevenbergMarquardt as optimizer to optimize KL loss? (Or perhaps i should use least square API?)

timnotavailable avatar Sep 02 '25 09:09 timnotavailable

Least-squares solvers such as LevenbergMarquardt leverage access to residuals of functions with multiple outputs, such as those common in data-fitting. Minimisation algorithms minimise a scalar loss - they can work in a least-squares setting by minimising the sum of squared residuals. That is why you can't convert from a sum of squares back to a residuals-based formulation, and why this is not an option in optx.minimise.

A least-squares solver could still work on a scalar loss, and would be accessible through optx.least_squares. But since the Kullback-Leibler divergence already gives you a scalar loss, I would expect BFGS to perform better, since it iteratively builds up its second-order approximation throughout the solve.

johannahaffner avatar Sep 02 '25 09:09 johannahaffner

Got it~ Thanks a lot Johanna!

timnotavailable avatar Sep 02 '25 11:09 timnotavailable

You're welcome!

johannahaffner avatar Sep 02 '25 13:09 johannahaffner