DESC icon indicating copy to clipboard operation
DESC copied to clipboard

Rescale optimizer termination criteria

Open f0uriest opened this issue 1 year ago • 2 comments
trafficstars

Most of our optimizers have 3 main exit criteria: $$||\Delta f || / ||f|| < ftol$$ $$||\Delta x || / ||x|| < xtol$$ $$||\nabla f ||_\infty < gtol$$

The normalizations in the objective functions generally ensure that the first one (ftol) is more or less independent of scaling/units etc, however xtol and gtol still have units of x or 1/x respectively, meaning that rescaling variables can affect the stopping criteria. This is especially noticeable when doing coil optimization or optimizing a current profile, since current is ~1e6x larger than the other degrees of freedom, it can mean exiting early once the current reaches a good value but the other variables can still be far from optimal.

This PR adds an option scaled_termination (defaults to True) to all of the desc optimizers to measure the norms for xtol and gtol in the scaled norm provided by x_scale (which defaults to using an adaptive scaling based on the Jacobian or Hessian). This should make things a bit better when optimizing parameters with widely different magnitudes.

Resolves #797

f0uriest avatar Jun 25 '24 23:06 f0uriest

|             benchmark_name             |         dt(%)          |         dt(s)          |        t_new(s)        |        t_old(s)        | 
| -------------------------------------- | ---------------------- | ---------------------- | ---------------------- | ---------------------- |
 test_build_transform_fft_midres         |     -0.41 +/- 4.49     | -2.50e-03 +/- 2.72e-02 |  6.04e-01 +/- 1.8e-02  |  6.07e-01 +/- 2.0e-02  |
 test_build_transform_fft_highres        |     +0.68 +/- 1.70     | +6.53e-03 +/- 1.63e-02 |  9.68e-01 +/- 1.3e-02  |  9.61e-01 +/- 1.0e-02  |
 test_equilibrium_init_lowres            |     -0.42 +/- 2.05     | -1.66e-02 +/- 7.98e-02 |  3.88e+00 +/- 3.3e-02  |  3.90e+00 +/- 7.3e-02  |
 test_objective_compile_atf              |     +0.39 +/- 4.85     | +3.07e-02 +/- 3.84e-01 |  7.94e+00 +/- 2.7e-01  |  7.91e+00 +/- 2.7e-01  |
 test_objective_compute_atf              |     +0.01 +/- 4.75     | +8.98e-07 +/- 5.00e-04 |  1.05e-02 +/- 2.1e-04  |  1.05e-02 +/- 4.5e-04  |
 test_objective_jac_atf                  |     -0.37 +/- 3.62     | -7.21e-03 +/- 7.00e-02 |  1.93e+00 +/- 6.0e-02  |  1.93e+00 +/- 3.6e-02  |
 test_perturb_1                          |     +1.02 +/- 2.25     | +1.45e-01 +/- 3.19e-01 |  1.43e+01 +/- 1.6e-01  |  1.42e+01 +/- 2.8e-01  |
 test_proximal_jac_atf                   |     -0.04 +/- 1.05     | -3.01e-03 +/- 8.68e-02 |  8.23e+00 +/- 5.6e-02  |  8.23e+00 +/- 6.7e-02  |
 test_proximal_freeb_compute             |     -0.81 +/- 1.16     | -1.58e-03 +/- 2.29e-03 |  1.95e-01 +/- 1.1e-03  |  1.97e-01 +/- 2.0e-03  |
 test_solve_fixed_iter_compiled          |     +0.15 +/- 1.67     | +2.57e-02 +/- 2.82e-01 |  1.69e+01 +/- 2.6e-01  |  1.69e+01 +/- 1.1e-01  |
 test_build_transform_fft_lowres         |     -4.31 +/- 7.62     | -2.36e-02 +/- 4.18e-02 |  5.25e-01 +/- 2.4e-02  |  5.49e-01 +/- 3.4e-02  |
 test_equilibrium_init_medres            |     -5.74 +/- 2.14     | -2.51e-01 +/- 9.39e-02 |  4.13e+00 +/- 5.9e-02  |  4.38e+00 +/- 7.3e-02  |
 test_equilibrium_init_highres           |     -4.48 +/- 1.84     | -2.53e-01 +/- 1.04e-01 |  5.39e+00 +/- 4.2e-02  |  5.64e+00 +/- 9.5e-02  |
 test_objective_compile_dshape_current   |     -0.01 +/- 5.50     | -3.88e-04 +/- 2.11e-01 |  3.84e+00 +/- 2.1e-01  |  3.84e+00 +/- 3.5e-02  |
 test_objective_compute_dshape_current   |     -0.99 +/- 1.26     | -3.62e-05 +/- 4.59e-05 |  3.62e-03 +/- 3.3e-05  |  3.65e-03 +/- 3.2e-05  |
 test_objective_jac_dshape_current       |     +2.34 +/- 8.02     | +9.43e-04 +/- 3.23e-03 |  4.12e-02 +/- 2.8e-03  |  4.03e-02 +/- 1.5e-03  |
 test_perturb_2                          |     +1.20 +/- 2.91     | +2.29e-01 +/- 5.55e-01 |  1.93e+01 +/- 5.1e-01  |  1.91e+01 +/- 2.2e-01  |
 test_proximal_freeb_jac                 |     +0.98 +/- 1.25     | +7.31e-02 +/- 9.31e-02 |  7.55e+00 +/- 7.9e-02  |  7.48e+00 +/- 4.9e-02  |
 test_solve_fixed_iter                   |     +2.91 +/- 2.30     | +8.28e-01 +/- 6.55e-01 |  2.93e+01 +/- 6.2e-01  |  2.85e+01 +/- 2.2e-01  |
 test_LinearConstraintProjection_build   |     +1.14 +/- 1.87     | +2.59e-01 +/- 4.26e-01 |  2.30e+01 +/- 3.5e-01  |  2.27e+01 +/- 2.4e-01  |

github-actions[bot] avatar Jun 26 '24 01:06 github-actions[bot]

A suggestion for the intended functionality, can we just take the ratio step_ratio=step/x and check if any element of step_ratio is smaller then some new tolerance rtol?

YigitElma avatar Aug 29 '24 19:08 YigitElma

Codecov Report

Attention: Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.60%. Comparing base (17d4939) to head (1d121c7). Report is 1598 commits behind head on master.

Files with missing lines Patch % Lines
desc/optimize/aug_lagrangian_ls.py 93.75% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1073      +/-   ##
==========================================
+ Coverage   95.58%   95.60%   +0.01%     
==========================================
  Files          96       96              
  Lines       24597    24601       +4     
==========================================
+ Hits        23512    23519       +7     
+ Misses       1085     1082       -3     
Files with missing lines Coverage Δ
desc/optimize/aug_lagrangian.py 96.95% <100.00%> (+1.32%) :arrow_up:
desc/optimize/fmin_scalar.py 98.08% <100.00%> (+0.01%) :arrow_up:
desc/optimize/least_squares.py 99.33% <100.00%> (+<0.01%) :arrow_up:
desc/optimize/aug_lagrangian_ls.py 95.69% <93.75%> (+0.02%) :arrow_up:

... and 1 file with indirect coverage changes

codecov[bot] avatar Nov 13 '24 07:11 codecov[bot]

The advanced optimization notebook has some differing results now (probably due to differing termination events of the eq subproblem with this branch vs master, at least for the proximal case), the auglag result is slightly worse and the QS results change slightly too. I think it would be good to run all the notebooks again (or at least some of them, or make an issue/other branches that runs them and updates them) so that what we have up on the docs is what actually a user would see when running the notebook.

dpanici avatar Nov 21 '24 19:11 dpanici

Should we merge this before new release, or you want to wait until #1406 ?

YigitElma avatar Dec 04 '24 03:12 YigitElma

Should we merge this before new release, or you want to wait until #1406 ?

I think we can wait till the next one.

f0uriest avatar Dec 04 '24 04:12 f0uriest