BOUT-dev icon indicating copy to clipboard operation
BOUT-dev copied to clipboard

Fix iteration counts change all iter rebase

Open dschwoerer opened this issue 11 months ago • 7 comments

Rebase of #2535

TODO:

  • [ ] mention breaking change in changelog
  • [ ] add test:
    • [ ] normal run
    • [ ] restart
    • [ ] without dump_on_restart
    • [ ] with smaller timestep monitor
    • [ ] with larger timestep monitor

I have some done some manual testing, but not yet the smaller timestep (and might be wrong)

dschwoerer avatar Mar 14 '24 14:03 dschwoerer

Oh, indeed, they are broken.

How about this:

|  Step 0 of 3. Elapsed 0:00:00.0 ETA 0:00:00.1
/  Step 1 of 3. Elapsed 0:00:01.1 ETA 0:00:02.1
-  Step 2 of 3. Elapsed 0:00:01.5 ETA 0:00:00.4
\  Step 3 of 3. Elapsed 0:00:02.1 ETA 0:00:00.0

restart

|  Step 3 of 6. Elapsed 0:00:00.0 ETA 0:00:00.1
/  Step 4 of 6. Elapsed 0:00:01.1 ETA 0:00:02.2
-  Step 5 of 6. Elapsed 0:00:01.9 ETA 0:00:00.8
\  Step 6 of 6. Elapsed 0:00:03.4 ETA 0:00:00.0

dschwoerer avatar Mar 18 '24 14:03 dschwoerer

Sounds good to me @dschwoerer! :+1:

johnomotani avatar Mar 18 '24 15:03 johnomotani

@ZedThree The unit tests are failing, but I did not figure out what is wrong. I am thus hesitant to just change the expected values. The real code seems to do the right thing, but the fake ones seem to be going wrong somewhere ...

dschwoerer avatar Mar 18 '24 15:03 dschwoerer

@dschwoerer This is on my to-do list to look at!

ZedThree avatar Mar 26 '24 17:03 ZedThree

I think there's a bug in the current implementation in master and next, or at least behaviour that I find surprising. I've modified the monitor example to print the name of the monitor for clarity, and got this output (on next):

Custom output monitor fast, time = 0.000000e+00, step -1 of 20

Custom output monitor default, time = 0.000000e+00, step -1 of 10
Sim Time  |  RHS evals  | Wall Time |  Calc    Inv   Comm    I/O   SOLVER

0.000e+00          1       4.33e-02    -1.3    0.0    1.6  147.2  -47.4
|  Step 1 of 10. Elapsed 0:00:00.0 ETA 0:00:00.3
Custom output monitor fast, time = 1.000000e+00, step 0 of 10

Custom output monitor fast, time = 2.000000e+00, step 1 of 10

Custom output monitor default, time = 2.000000e+00, step 0 of 5
2.000e+00         92       6.33e-02     5.8    0.0    0.0   50.6   43.6
/  Step 1 of 5. Elapsed 0:00:00.1 ETA 0:00:00.1
Custom output monitor fast, time = 3.000000e+00, step 2 of 10

Custom output monitor fast, time = 4.000000e+00, step 3 of 10

Custom output monitor default, time = 4.000000e+00, step 1 of 5
4.000e+00         41       5.49e-02     4.1    0.0    0.0   45.3   50.7
-  Step 2 of 5. Elapsed 0:00:00.2 ETA 0:00:00.1
Custom output monitor fast, time = 5.000000e+00, step 4 of 10

Custom output monitor fast, time = 6.000000e+00, step 5 of 10

Custom output monitor default, time = 6.000000e+00, step 2 of 5
6.000e+00         44       5.48e-02     3.5    0.0    0.0   46.1   50.3
\  Step 3 of 5. Elapsed 0:00:00.2 ETA 0:00:00.0
Custom output monitor fast, time = 7.000000e+00, step 6 of 10

Custom output monitor fast, time = 8.000000e+00, step 7 of 10

Custom output monitor default, time = 8.000000e+00, step 3 of 5
8.000e+00         41       5.24e-02     3.7    0.0    0.0   45.3   51.0
|  Step 4 of 5. Elapsed 0:00:00.3 ETA 0:00:-0.1
Custom output monitor fast, time = 9.000000e+00, step 8 of 10

Custom output monitor fast, time = 1.000000e+01, step 9 of 10

Custom output monitor default, time = 1.000000e+01, step 4 of 5
1.000e+01         44       5.80e-02     4.2    0.0    0.0   42.8   53.0
/  Step 5 of 5. Elapsed 0:00:00.3 ETA 0:00:-0.1

This is with nout = 10 and timestep = 1. Although the simulation runs to sim time nout * timestep and the relative frequencies of the monitors are all correct, notice that there are only 5 output steps that happen with timestep = 2.

I think the fix is:

modified   src/solver/solver.cxx
@@ -498,6 +498,9 @@ int Solver::solve(int nout, BoutReal timestep) {
 
   finaliseMonitorPeriods(nout, timestep);
 
+  number_output_steps = nout;
+  output_timestep = timestep;
+
   output_progress.write(
       _("Solver running for {:d} outputs with output timestep of {:e}\n"), nout,
       timestep);

This then runs for 10 output steps with timestep = 1, which is what I expect at least

ZedThree avatar Mar 28 '24 10:03 ZedThree

I've got my head around the unit tests too, I'll push a fix for them shortly, along with some improved documentation about how the monitor timesteps work

ZedThree avatar Mar 28 '24 10:03 ZedThree

I've fixed the unit tests by converting them to use a mock monitor to check exactly what's getting called and with what arguments, and writing the tests in terms of nout and nout - 1. I got this working on next and then when porting to this branch, I just checked that removing the -1 made things work.

There's just a few clang-tidy warnings in solver.hxx that need fixing, then I think this is good to go.

ZedThree avatar Mar 28 '24 15:03 ZedThree