benchopt Unexpected stop because of check_convergence in stopping

Unexpected stop because of check_convergence in stopping_criterion

Open jeandut opened this issue 2 years ago • 2 comments

When running benchmarks with multiple solvers and the params max_runs set, some solvers will end up producing a much lower number of iterates than max_runs because check_convergence makes them exit before their time. This might not be intended behavior for some users. Especially for DL people using mini-batch SGD like solvers where it often happens that the objective (test error) starts rising again before going down because of non-convexity (finding local minima then "escaping" making objective rise to find a better one afterwards) , variance or double-descent phenomenon A couple of things that would help:

make it clearer that it stopped because of check_convergence in the logs
allow users to disable this check
implement more robust logic
explain more about what this check is doing in the docs (notably the role of patience and tolerance (EPS) and their default values)

Jan 24 '23 16:01 jeandut

Yes, I agree that this is a confusing part of benchopt, I would be happy to find a way to make it clearer/simpler.

The users can already change the way the checks are done by setting a class attribute stopping_criterion, as described in this part of the doc. We rework this not long ago, but I think this is still confusing as:

It is not clear what is the Solver.stopping_criterion, and how to overwrite it.
The default values are not described.
The default value to SufficientProgressCriterion(patience=3, eps=1e-10) is maybe not suited.

This choices also needs to take into account the selection of the proper sampling strategy, which is log scale for now but could be changed to linear by default. I would be happy to have your thoughts on this one.

Jan 29 '23 10:01 tomMoral

I only found out about this part of the doc from your comment and actually had to "debug" this behavior by diving deep into the code. Maybe adding a simple example on how to override this stopping criterion using the least amount of code would help a lot new users. Indeed the interaction between this stopping criterion and the sampling rate should be emphasized. Regarding the default frankly I have opinions but again they would be nice for DL people maybe not for the entire optimization community.

Jan 29 '23 15:01 jeandut

benchopt benchopt copied to clipboard

Unexpected stop because of check_convergence in stopping_criterion

benchopt
benchopt copied to clipboard