Can we use GaussNewton/LevenbergMarquardt on KL loss minimization
Hello,
suppose I have a small variational autoencoder VAE (under 200 paras) and I'm going to train it in a very time efficient way, so because the learning parameters are so few so that I plan to use second-order optimization to converge faster and more robust. I found it is impossible to invoke the "optimistix.minimise" API with optimistix.AbstractLeastSquaresSolver such as GaussNewton or LevenbergMarquardt , I'm about to ask whether it is possible and whether it is suitable to use GaussNewton/LevenbergMarquardt as optimizer to optimize KL loss? (Or perhaps i should use least square API?)
Least-squares solvers such as LevenbergMarquardt leverage access to residuals of functions with multiple outputs, such as those common in data-fitting. Minimisation algorithms minimise a scalar loss - they can work in a least-squares setting by minimising the sum of squared residuals.
That is why you can't convert from a sum of squares back to a residuals-based formulation, and why this is not an option in optx.minimise.
A least-squares solver could still work on a scalar loss, and would be accessible through optx.least_squares. But since the Kullback-Leibler divergence already gives you a scalar loss, I would expect BFGS to perform better, since it iteratively builds up its second-order approximation throughout the solve.
Got it~ Thanks a lot Johanna!
You're welcome!