pints
pints copied to clipboard
Add "xtol" and "ftol" stopping criteria for optimisers
Many implemented optimisers provide stopping criteria based on the difference between the last two parameter sets (abs(x[i] - x[i - 1]
) or the last two function evaluations (abs(f[i] - f[i - 1])
). When comparing methods from PINTS with other methods, it is useful to have something similar.
How do people define xtol and ftol?
Absolute versus relative
Occasionally, relative differences are used, e.g. abs(f[i] - f[i - 1]) / f[i]
.
- In matlab the interpretation of xtol and ftol depends on the method. See here and this per-method table
- In scipy the existence of xtol and ftol and their implementation is method dependent. E.g. absolute for Nelder mead, relative for Powell, not at all for CG
- In nlopt users can add both https://nlopt.readthedocs.io/en/latest/NLopt_Python_Reference/#stopping-criteria
Scalar xtol?
In nlopt, xtol can be a scalar, in matlab it looks like it needs to have size n_parameters
.
AND vs OR
In scipy's Nelder-mead, xtol AND ftol must be met. Everywhere else they are independent stopping criteria https://stackoverflow.com/questions/43644797/what-is-xtol-for-in-minimizemethod-nelder-mead
Proposal
- Add xtol and ftol as separate, independent criteria (so OR, not AND)
- They will not be set by default, so users can enable them (or disable them) for methods and problems where they think this is appropriate
- Adding relative versions means doubling the workload, and dealing with the case where x or f has 0 at its optimum. The only benefit is that the user can write
set_xtol(generic_tol)
instead ofset_xtol(abs_tol / sensible_scaling_value)
. I propose we do absolute only, leaving scaling to the user (who has the info needed for this decision) - Similarly, allowing a scalar xtol is more implementation/documentation/tests/maintenance/room-for-bugs, all for the sake of a user writing
set_xtol(tol)
instead ofset_xtol([tol] * n_parameters)
. So xtol always n_parameters, ftol always scalar.
Both matlab and scipy seem to be phasing out the "xtol"/"ftol" terminology (i.e. scipy is using xatol/ratol when relative, matlab is saying "StepTolerance"/"FunctionTolerance", so might follow that and
- Call the methods
set_step_tolerance
andset_function_tolerance
. The current names aremax_iterations
,max_evaluations
,max_unchanged_iterations
, andthreshold
(stop iff < threshold
). So could haveset_step_tolerance
andset_tolerance
instead? Orset_min_step
andset_min_change
?
I'm happy with those proposals, doing absolute ones makes more sense as relative would presumably be with respect to initial guess or something which would be a bit strange to vary run to run. Haven't ever seen a package specify a vector of xtol, but it makes sense, maybe I have just been automatically doing [tol]*n_params
behind the scenes, but a nice friendly message saying "the xtol should be a vector of length n_parameters specifying separate tolerances for each parameter" would make it easy enough to see what needs to be done.
I think matlab's optimset
(common optimisation options object used by a range of optimisers) is the thing to look at, that uses 'TolX' and 'TolFun' nowadays: optimset docs
Although, the matlab optimisation toolbox itself has a different optimoptions which gets the options for a particular optimiser, they seem to more commonly use ObjectiveLimit
(for "got to top of Ben Nevis" stopping), OptimalityTolerance
or FunctionTolerance
for TolF
and StepTolerance
for XTol
, presumably based on how far in parameter space it moved (might even be a Euclidean distance or something?):
MaxIterations: 1000
ObjectiveLimit: -1.0000e+20
OptimalityTolerance: 1.0000e-06
OutputFcn: []
PlotFcn: []
ScaleProblem: 0
SpecifyConstraintGradient: 0
SpecifyObjectiveGradient: 0
StepTolerance: 1.0000e-10
OK, so with our threshold
then we can match all 3. That's good!
@mirams @martinjrobins @chonlei discussion point: The ftol
is essentially the same as set_max_unchanged_iterations
with iterations=1
.
Options:
- Don't implement a seperate
ftol
criterion - Implement a separate ftol criterion
- Implement a method
set_function_tolerance
that internally callsset_max_unchanged_iterations
(and document that this is what it does)
Input from anyone else doing optimisation appreciated!
There is a case for having both I think, if you want to use the ftol
as a convergence test for something that is expected to be asymptotic at the end but keep that max-unchanged to detect being stuck
I am not too sure. For option 3, it is like having an alias to set_max_unchanged_iterations
with argument iterations=1
, right? If it is what it is, then might be the easiest workaround?
Yeah, an argument for just wrapping, so ftol(tol) = set_max_unchanged_iterations(iterations=1, tol)
. I guess the problem there is that it somewhat needs all the optimisers to agree what an 'iteration' means. I had imagined that you'd want to be usually wrapping the ftol()
function provided by off-the-shelf optimisers, and I imagine different optimisers can and will define "over how many function evals/iterations/whatever" depending on how they work and are known to converge. Whereas I guess the max_changed_iterations
is something we're implementing ourselves?
I think confusion could then arise if the user can call both, but perhaps not, might even be a good idea to be able to set a collection of these like
set_ftol(1e-7) // let the optimiser decide how to implement
set_max_unchanged_iterations(iterations=1, tol=1e-12)
set_max_unchanged_iterations(iterations=10, tol=1e-10)
set_max_unchanged_iterations(iterations=100, tol=1e-8)
and stop the first time one is met!! But it is a bit crazy.
I guess the problem there is that it somewhat needs all the optimisers to agree what an 'iteration' means. I had imagined that you'd want to be usually wrapping the ftol() function provided by off-the-shelf optimisers, and I imagine different optimisers can and will define "over how many function evals/iterations/whatever" depending on how they work and are known to converge
No we ignore what the optimisers say so that we can use the same controller to run optimisations with them! They use different number of evaluations per iteration, that's fine :D
To clarify the "who implements what" thing a bit further:
- All optimisers in PINTS use an ask-and-tell interface, [one call to ask() then tell()] = one iteration. For PINTS we don't care if that's unfair for comparisons as that's not the goal, the goal is just to make a few useful ones available.
- The OptimisationController implements functionality shared by optimisers, so that the Optimiser classes themselves can stay clean. This includes optimisation of stopping criteria such as max unchanged iterations, max iterations, max evaluations, function threshold, and after this ticket also xtol (but perhaps not ftol).
- Users who want complicated stopping criteria can forgo the controller and use ask-and-tell on the Optimiser classes directly. This is our main mechanism for limiting the scope of PINTS and stop it from becoming totally unmanageable for the sake of obscure/advanced use cases 😄
- The ask-and-tell requirement plus the goal of staying pure python means we can't wrap much. When we do wrap (only CMAES at the moment), we disable as many of the optimiser's own criteria, so that the user is presented with a single, unified interface for setting stopping criteria.
- However, the Optimisation class has a method
stop()
https://pints.readthedocs.io/en/stable/optimisers/base_classes.html#pints.Optimiser.stop that it can use to tell the Controller when it's stuck (e.g. an ill-conditioned matrix or some unrecoverable error). There is nothing stopping people from adding new options to anOptimiser
class that would re-enable its native stopping criteria (you would then set this usingcontroller.optimiser().set_something_or_other(...)
)
might even be a good idea to be able to set a collection of these like
That's an interesting idea! Would require some laborious unsetting syntax though, e.g. unset_unchanged_iterations
with the exact same arguments, or having set
return some object you can then pass in to unset.
Will leave that for now as we don't have a direct use case :D
But both leaning towards option 2 then?
Alright, will add a max_unmoved_iterations
then so it has the same syntax as max_unchanged_iterations
.
Stop when | Matlab | Scipy (fmin) | nlopt | pycma |
---|---|---|---|---|
f(x) change < t | FunctionTolerance, TolFun | ftol | ftol | tolfun |
x change < t | StepTolerance, TolX | xtol | xtol | tolx |
f(x) < t | stopval | ftarget |
- https://uk.mathworks.com/help/matlab/math/setting-options.html (normal Matlab)
- https://uk.mathworks.com/help/optim/ug/tolerances-and-stopping-criteria.html (Matlab optimisation toolbox)
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html
- https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/#termination-conditions
- https://github.com/CMA-ES/pycma/blob/c52672315c4f045c1dc8d95a1ff73b89c42bb071/cma/evolution_strategy.py#L3927C1-L3937C63