optimistix icon indicating copy to clipboard operation
optimistix copied to clipboard

Control over Learning Rate and/or initial step size in linesearch

Open matillda123 opened this issue 11 months ago • 10 comments

Hi,

is there a convenient way to influence the step size of the available minimizers/least_square solvers? (E.g. in optax one has the learning_rate argument for all optimizers) So far with the optimistix solvers I was only able to gain some control by changing solver.search.__dict__["step_init"], which doesnt seem right.

I guess one could just define a custom solver. But it seems weird that on one side there is so much customizability with the modular approach of optimistix, while with the predefined solvers there is very little user-control.

matillda123 avatar Jan 08 '25 13:01 matillda123

Hi Matilda,

you could indeed define a custom solver by re-using the searches and descents and tweaking their constants, if you desire smaller steps on average.

A learning rate would not necessarily be a meaningful concept if used with a backtracking line search or a classical trust region. They choose the step length based on optimality criteria, rather than just going into some direction regardless of whether this actually improves the objective function (by the expected amount) or not.

You could still bias these toward smaller step sizes (e.g. by requiring that BacktrackingArmijo starts with a step size of 0.5, rather than 1.0), but this would not alter their acceptance criteria. For the trust region radius, you could change the value of the high_constant and low_constant parameters, which are used to multiply the radius depending on whether the search judges the objective to be sufficiently quadratic.

Instead of tweaking solver attributes as above, you can also modify the solvers like this:

import equinox as eqx
import optimistix as optx

solver = optx.SomeSolver(...)
search = optx.BacktrackingArmijo(..., step_init=...)  # Use your pick of constants

get = lambda s: s.search
solver = eqx.tree_at(get, solver, search)

I do this when I quickly want to try something out and don't want to bother writing down the attributes and constructor, but if I would actually use the solver many times, I'd do it properly :)

johannahaffner avatar Jan 08 '25 13:01 johannahaffner

@matillda123 do you have an example of the kind of thing you'd like to do, using a particular solver for clarity? :)

patrick-kidger avatar Jan 08 '25 17:01 patrick-kidger

I'm not sure if it's possible to give a short example here. What im trying to do is basically fitting of a high-dimensional function to some data. The function basically consists of some FFTs. Its similar to what is described in this paper.

I'm not using one specific solver, rather I'm trying to get a feeling which solvers work nicely for this problem I have the suspicion, that some solvers may sometimes work "significantly" better when the step size is biased towards bigger or smaller stepsizes. I noticed something like that in optax when tuning learning_rate with adam(+linesearch) and lbfgs. So, I was looking for a way to test this with the predefined optimistix solvers and didn't find one.

I hope its clearer now what the issue is.

matillda123 avatar Jan 08 '25 23:01 matillda123

I see, tough optimisation problem!

It sounds like you want to do hyperparameter tuning - what do you get if a solver is not working well? Convergence to a local minimum, running out of steps, divergence?

And are you working with complex numbers, or have you split the problem into complex and real parts?

johannahaffner avatar Jan 09 '25 08:01 johannahaffner

I have real inputs but I use those to make complex numbers. If a solver doesn't work well, for me, that generally means it's stuck in a local minimum or converging really slowly.

matillda123 avatar Jan 09 '25 13:01 matillda123

So optimistix only ever sees real numbers?

The functions you are working with can have several local minima, right? In this case, you might want to try multi-starting your optimisation. If you'd like step sizes larger than one wherever your optimisation landscape looks suitably quadratic, maybe try optx.BFGS with a ClassicalTrustRegion? The trust region radius can get larger than one, and has the widest range of step sizes, since these are updated multiplicatively and can increase as well as decrease. You can also take a look at the combinations here, such as

https://github.com/patrick-kidger/optimistix/blob/dcafc480d4a2bb9ec3b62658d5ae136db847b2d8/tests/helpers.py#L126

(It sounds like you tried BacktrackingArmijo, which will start with the full Newton step in BFGS and then chop that in half until it's satisfied.)

Those are but two suggestions - I'm afraid I can't say much more with only a basic idea of what your problem is.

johannahaffner avatar Jan 09 '25 16:01 johannahaffner

Yes, optimistix should only see real numbers.

So far I only used standard solvers (mainly BFGS, NonlinearCG, GaussNewton and Dogleg). I think Dogleg has a trustregion linesearch. But it's tricky as that is one of the cases which can get stuck in local minima.

Alright, so I guess there is no way around modifying the solvers with different searches and tuning their parameters.

Thanks for the suggestions. :)

matillda123 avatar Jan 10 '25 16:01 matillda123

Alright! Happy solving :)

johannahaffner avatar Jan 10 '25 18:01 johannahaffner

I was looking through Optimistix and saw that this issue was referring to my paper! :)

During my PhD, I also spent quite a bit of time developing autodiff-based optimization algorithms that work for these kind of complex-valued problems. Some of the general framework is covered in this paper: https://opg.optica.org/oe/fulltext.cfm?uri=oe-29-15-23019

The code is here: https://github.com/saugatkandel/optim_second_order_autodiff

The parts that I last updated were the TF2 code here: https://github.com/saugatkandel/optim_second_order_autodiff/tree/master/sopt/optimizers/tensorflow2

Many of the approaches I implemented are already in Optimistix, but maybe there is something novel there?

For example, one of the ideas that I experimented with (but did not reproduce in the TF2 version of the code) was the use of matrix-free stochastic estimation of the diagonal of the Gauss-Newton matrix for use as a preconditioner (https://github.com/saugatkandel/optim_second_order_autodiff/blob/6e1936b2cecd02f06d6e7df6f6e7f503ed2c36b3/sopt/optimizers/tensorflow/lma.py#L116):

I hope someone here finds one or more these approaches interesting!

saugatkandel avatar Oct 16 '25 01:10 saugatkandel

Alright, so I guess there is no way around modifying the solvers with different searches and tuning their parameters.

This is about to become easier, see: https://github.com/patrick-kidger/optimistix/issues/187#issuecomment-3507645837

I was looking through Optimistix and saw that this issue was referring to my paper! :)

And I see that I never got back to you on this, @saugatkandel. Sorry about that! This got a bit lost in my inbox, probably because I wanted to take time to read your references and never got around to it.

Optimistix is more focused on deterministic algorithms, but it is definitely good to be aware of + be able to link to related methods, such as your preconditioners. If you ever want to make a contribution, these are always welcome!

johannahaffner avatar Nov 09 '25 12:11 johannahaffner