Microphysics icon indicating copy to clipboard operation
Microphysics copied to clipboard

allow retry with different Jacobian

Open zingale opened this issue 1 year ago • 10 comments

The analytic Jacobian often works really well, but for some hard burns, the numerical Jacobian is better. In this case, rather than failing and having to have the code (Castro) throw away the entire timestep and retry, it would be useful to allow for the whole burn for a zone to be retried with the numerical Jacobian.

This kinda brings back the retry mechanism we had in VODE a while back, but in a slightly different fashion.

zingale avatar Jul 11 '22 11:07 zingale

Another angle here would be to retry an individual burning step with the other Jacobian. It is probably easier to implement the first idea because we have already done something similar, but it's worth pursuing the within-burn retries because it's probably significantly cheaper.

maxpkatz avatar Jul 11 '22 12:07 maxpkatz

The prerequisite for either idea is that the interfaces in the burner going down to dvjac need to decide which Jacobian to use via function arguments rather than reading the integrator.jacobian runtime parameter.

maxpkatz avatar Jul 11 '22 12:07 maxpkatz

ah indeed, so maybe the first step is to just do that refactor

zingale avatar Jul 11 '22 12:07 zingale

Should be straightforward because we can just put the Jacobian choice into the dvode_t.

There may be an even easier way though: to the extent that the comments in vode_dvnlsd.H and vode_dvjac.H are correct, then vstate.ICF > 0 means that we had some sort of error in solving the nonlinear system, which is what prompts the cached Jacobian to be re-evaluated when Jacobian caching is used. Maybe as a first step we can just use the same conditions to use the other Jacobian in this call to dvjac than the one specified by integrator.jacobian?

maxpkatz avatar Jul 11 '22 12:07 maxpkatz

my read of this is that is NFLAG < 0, then we had a failure (singular, or convergence failure), and we should redo the Jacobian, perhaps switching to numerical from analytic.

but perhaps these say the same thing, since ICF and NFLAG seem to be related.

zingale avatar Jul 13 '22 00:07 zingale

Ah, okay, I understand now. Only ICF makes it out of dvnlsd, so indeed, if we force a change of Jacobian type in dvjac when ICF > 0, then I think that will accomplish our goals -- that will only change the type if there was a failure, and not if the Jacobian is "stale" because of the caching.

zingale avatar Jul 13 '22 00:07 zingale

One possible version of this is:

$ git diff
diff --git a/integration/VODE/vode_dvjac.H b/integration/VODE/vode_dvjac.H
index 04345a94..89f7a3dc 100644
--- a/integration/VODE/vode_dvjac.H
+++ b/integration/VODE/vode_dvjac.H
@@ -67,7 +67,7 @@ void dvjac (IArray1D& pivot, int& IERPJ, burn_t& state, dvode_t& vstate)
         // We want to evaluate the Jacobian -- now the path depends on
         // whether we're using the numerical or analytic Jacobian.

-        if (vstate.jacobian_type != 2) {
+        if ((vstate.jacobian_type != 2 && vstate.ICF == 0) || (vstate.jacobian_type == 2 && vstate.ICF > 0)) {

This has a negligible effect on the number of RHS calls in test_react for aprox13; but I think to really test this out for the intended use case we need to have an example which reliably fails to integrate, and by construction we don't have that in the test suite.

maxpkatz avatar Jul 14 '22 18:07 maxpkatz

The above change makes things slightly worse for test_react with subch_full. Before:

$ ./main3d.gnu.ex inputs_aprox21 integrator.use_jacobian_caching=0 integrator.jacobian=1
AMReX (22.03-26-ge213d767baab) initialized
reading extern runtime parameters ...
reading in network electron-capture / beta-decay tables...
Run time = 4.9962261
min number of rhs calls: 6
avg number of rhs calls: 113
max number of rhs calls: 3590

After:

$ ./main3d.gnu.ex inputs_aprox21 integrator.use_jacobian_caching=0 integrator.jacobian=1
AMReX (22.03-26-ge213d767baab) initialized
reading extern runtime parameters ...
reading in network electron-capture / beta-decay tables...
Run time = 5.0947613
min number of rhs calls: 6
avg number of rhs calls: 122
max number of rhs calls: 5743
AMReX (22.03-26-ge213d767baab) finalized

But we should really try this out on something that has burn failures, like a run of the Castro subchandra problem, or a wdmerger run.

maxpkatz avatar Jul 14 '22 18:07 maxpkatz

if you drop this on a branch I'll run a test with and without it this evening.

zingale avatar Jul 14 '22 18:07 zingale

we should also consider the idea of not integrating T on retry

zingale avatar Oct 15 '22 16:10 zingale