pints icon indicating copy to clipboard operation
pints copied to clipboard

Add "xtol" and "ftol" stopping criteria for optimisers

Open MichaelClerx opened this issue 1 year ago • 11 comments

Many implemented optimisers provide stopping criteria based on the difference between the last two parameter sets (abs(x[i] - x[i - 1]) or the last two function evaluations (abs(f[i] - f[i - 1])). When comparing methods from PINTS with other methods, it is useful to have something similar.

How do people define xtol and ftol?

Absolute versus relative

Occasionally, relative differences are used, e.g. abs(f[i] - f[i - 1]) / f[i].

  • In matlab the interpretation of xtol and ftol depends on the method. See here and this per-method table
  • In scipy the existence of xtol and ftol and their implementation is method dependent. E.g. absolute for Nelder mead, relative for Powell, not at all for CG
  • In nlopt users can add both https://nlopt.readthedocs.io/en/latest/NLopt_Python_Reference/#stopping-criteria

Scalar xtol?

In nlopt, xtol can be a scalar, in matlab it looks like it needs to have size n_parameters.

AND vs OR

In scipy's Nelder-mead, xtol AND ftol must be met. Everywhere else they are independent stopping criteria https://stackoverflow.com/questions/43644797/what-is-xtol-for-in-minimizemethod-nelder-mead

Proposal

  1. Add xtol and ftol as separate, independent criteria (so OR, not AND)
  2. They will not be set by default, so users can enable them (or disable them) for methods and problems where they think this is appropriate
  3. Adding relative versions means doubling the workload, and dealing with the case where x or f has 0 at its optimum. The only benefit is that the user can write set_xtol(generic_tol) instead of set_xtol(abs_tol / sensible_scaling_value). I propose we do absolute only, leaving scaling to the user (who has the info needed for this decision)
  4. Similarly, allowing a scalar xtol is more implementation/documentation/tests/maintenance/room-for-bugs, all for the sake of a user writing set_xtol(tol) instead of set_xtol([tol] * n_parameters). So xtol always n_parameters, ftol always scalar.

Both matlab and scipy seem to be phasing out the "xtol"/"ftol" terminology (i.e. scipy is using xatol/ratol when relative, matlab is saying "StepTolerance"/"FunctionTolerance", so might follow that and

  1. Call the methods set_step_tolerance and set_function_tolerance. The current names are max_iterations, max_evaluations, max_unchanged_iterations, and threshold (stop if f < threshold). So could have set_step_tolerance and set_tolerance instead? Or set_min_step and set_min_change ?

MichaelClerx avatar Oct 31 '23 11:10 MichaelClerx

I'm happy with those proposals, doing absolute ones makes more sense as relative would presumably be with respect to initial guess or something which would be a bit strange to vary run to run. Haven't ever seen a package specify a vector of xtol, but it makes sense, maybe I have just been automatically doing [tol]*n_params behind the scenes, but a nice friendly message saying "the xtol should be a vector of length n_parameters specifying separate tolerances for each parameter" would make it easy enough to see what needs to be done.

I think matlab's optimset (common optimisation options object used by a range of optimisers) is the thing to look at, that uses 'TolX' and 'TolFun' nowadays: optimset docs image

Although, the matlab optimisation toolbox itself has a different optimoptions which gets the options for a particular optimiser, they seem to more commonly use ObjectiveLimit (for "got to top of Ben Nevis" stopping), OptimalityTolerance or FunctionTolerance for TolF and StepTolerance for XTol, presumably based on how far in parameter space it moved (might even be a Euclidean distance or something?):

                MaxIterations: 1000
               ObjectiveLimit: -1.0000e+20
          OptimalityTolerance: 1.0000e-06
                    OutputFcn: []
                      PlotFcn: []
                 ScaleProblem: 0
    SpecifyConstraintGradient: 0
     SpecifyObjectiveGradient: 0
                StepTolerance: 1.0000e-10

mirams avatar Oct 31 '23 15:10 mirams

OK, so with our threshold then we can match all 3. That's good!

MichaelClerx avatar Nov 14 '23 09:11 MichaelClerx

@mirams @martinjrobins @chonlei discussion point: The ftol is essentially the same as set_max_unchanged_iterations with iterations=1.

Options:

  1. Don't implement a seperate ftol criterion
  2. Implement a separate ftol criterion
  3. Implement a method set_function_tolerance that internally calls set_max_unchanged_iterations (and document that this is what it does)

Input from anyone else doing optimisation appreciated!

MichaelClerx avatar Nov 14 '23 12:11 MichaelClerx

There is a case for having both I think, if you want to use the ftol as a convergence test for something that is expected to be asymptotic at the end but keep that max-unchanged to detect being stuck

MichaelClerx avatar Nov 14 '23 12:11 MichaelClerx

I am not too sure. For option 3, it is like having an alias to set_max_unchanged_iterations with argument iterations=1, right? If it is what it is, then might be the easiest workaround?

chonlei avatar Nov 14 '23 13:11 chonlei

Yeah, an argument for just wrapping, so ftol(tol) = set_max_unchanged_iterations(iterations=1, tol). I guess the problem there is that it somewhat needs all the optimisers to agree what an 'iteration' means. I had imagined that you'd want to be usually wrapping the ftol() function provided by off-the-shelf optimisers, and I imagine different optimisers can and will define "over how many function evals/iterations/whatever" depending on how they work and are known to converge. Whereas I guess the max_changed_iterations is something we're implementing ourselves?

I think confusion could then arise if the user can call both, but perhaps not, might even be a good idea to be able to set a collection of these like

set_ftol(1e-7) // let the optimiser decide how to implement
set_max_unchanged_iterations(iterations=1, tol=1e-12)
set_max_unchanged_iterations(iterations=10, tol=1e-10)
set_max_unchanged_iterations(iterations=100, tol=1e-8)

and stop the first time one is met!! But it is a bit crazy.

mirams avatar Nov 14 '23 13:11 mirams

I guess the problem there is that it somewhat needs all the optimisers to agree what an 'iteration' means. I had imagined that you'd want to be usually wrapping the ftol() function provided by off-the-shelf optimisers, and I imagine different optimisers can and will define "over how many function evals/iterations/whatever" depending on how they work and are known to converge

No we ignore what the optimisers say so that we can use the same controller to run optimisations with them! They use different number of evaluations per iteration, that's fine :D


To clarify the "who implements what" thing a bit further:

  1. All optimisers in PINTS use an ask-and-tell interface, [one call to ask() then tell()] = one iteration. For PINTS we don't care if that's unfair for comparisons as that's not the goal, the goal is just to make a few useful ones available.
  2. The OptimisationController implements functionality shared by optimisers, so that the Optimiser classes themselves can stay clean. This includes optimisation of stopping criteria such as max unchanged iterations, max iterations, max evaluations, function threshold, and after this ticket also xtol (but perhaps not ftol).
  3. Users who want complicated stopping criteria can forgo the controller and use ask-and-tell on the Optimiser classes directly. This is our main mechanism for limiting the scope of PINTS and stop it from becoming totally unmanageable for the sake of obscure/advanced use cases 😄
  4. The ask-and-tell requirement plus the goal of staying pure python means we can't wrap much. When we do wrap (only CMAES at the moment), we disable as many of the optimiser's own criteria, so that the user is presented with a single, unified interface for setting stopping criteria.
  5. However, the Optimisation class has a method stop() https://pints.readthedocs.io/en/stable/optimisers/base_classes.html#pints.Optimiser.stop that it can use to tell the Controller when it's stuck (e.g. an ill-conditioned matrix or some unrecoverable error). There is nothing stopping people from adding new options to an Optimiser class that would re-enable its native stopping criteria (you would then set this using controller.optimiser().set_something_or_other(...))

MichaelClerx avatar Nov 14 '23 16:11 MichaelClerx

might even be a good idea to be able to set a collection of these like

That's an interesting idea! Would require some laborious unsetting syntax though, e.g. unset_unchanged_iterations with the exact same arguments, or having set return some object you can then pass in to unset.

Will leave that for now as we don't have a direct use case :D

MichaelClerx avatar Nov 14 '23 16:11 MichaelClerx

But both leaning towards option 2 then?

MichaelClerx avatar Nov 14 '23 16:11 MichaelClerx

Alright, will add a max_unmoved_iterations then so it has the same syntax as max_unchanged_iterations.

MichaelClerx avatar Nov 21 '23 13:11 MichaelClerx

Stop when Matlab Scipy (fmin) nlopt pycma
f(x) change < t FunctionTolerance, TolFun ftol ftol tolfun
x change < t StepTolerance, TolX xtol xtol tolx
f(x) < t stopval ftarget

  • https://uk.mathworks.com/help/matlab/math/setting-options.html (normal Matlab)
  • https://uk.mathworks.com/help/optim/ug/tolerances-and-stopping-criteria.html (Matlab optimisation toolbox)
  • https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html
  • https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/#termination-conditions
  • https://github.com/CMA-ES/pycma/blob/c52672315c4f045c1dc8d95a1ff73b89c42bb071/cma/evolution_strategy.py#L3927C1-L3937C63

MichaelClerx avatar Dec 05 '23 13:12 MichaelClerx