nmodl "SOLVE kinetic STEADYSTATE sparse" is not solving for steady state.

It appears that the implementation contains terms involving nt->dt nocmodl makes use of _ss_sparse_thread which iterative solves a nonlinear system using auto const ss_dt = 1e9; (ideally the solution should use the conservation constraints in place of an equal number of ODE to avoid trying to solve a singular matrix, but that is not directly relevant to this issue). This issue was observed when comparing NMODL and nocmodl results for nrn/share/examples/nrniv/nmodl/capmp.mod when launching nrngui capmp.hoc. The left column is for NMODL, the right is for nocmodl. (note: I'm using NMODL branch 1uc/fix-cadifpmp-shift). The top row shows calcium concentration as a function of time for three cell diameters. In every case the intention was to initialize the concentration to 0.01 mM (single compartment) The NMODL 100diam (black line) case calculated an initial steady state of approximately 0.0076 instead of the correct 0.01.

Ignore the bottom row. I'm showing it because the missing beginning and ending of the steady state calcium current vs constant calcium concentration for the NMODL simulation is a puzzle (but likely not relevant to the initialization issue for the large diameter case.)

I believe the steady state initialization issue is due to using the value of nt->dt in the steady state equations of the NMODL generated capmp.cpp file. Evidence for this is

oc>diam = 100
oc>init()
oc>cai
	0.0076215222 
oc>dt
	0.025 
oc>dt=10
oc>init()
oc>cai
	0.0099921471

but my naive expectations are dashed with

oc>dt=1e9
oc>init()
special: /home/hines/neuron/temp/share/examples/nrniv/nmodl/x86_64/capmp.cpp:824: void neuron::nrn_init_capr(const {anonymous}::_nrn_model_sorted_token&, NrnThread*, Memb_list*, int): Assertion `false && "Newton solver did not converge!"' failed.
Aborted (core dumped)

Aug 10 '24 18:08 nrnhines

Thank you, I can reproduce the issue and I'll debugging now. STEADY_STATE is a very good next keyword to work on.

Aug 12 '24 06:08 1uc

I'm using the following simplified MOD file:

NEURON {
  SUFFIX minipump
}

PARAMETER {
  volA = 1e9
  volB = 1e9
  volC = 13.0
  kf = 3.0
  kb = 4.0
}

STATE {
  X
  Y
  Z
}

INITIAL {
  X = 40.0
  Y = 2.0
  Z = 0.0

  SOLVE state STEADYSTATE sparse
}

: Required b/c of a bug in NMODL.
BREAKPOINT {
  SOLVE state METHOD sparse
}

KINETIC state {
  COMPARTMENT volA {X}
  COMPARTMENT volB {Y}
  COMPARTMENT volC {Z}

  ~ X + Y <-> Z (kf, kb)
  CONSERVE Y + Z = 2.0
}

The equation for X in NOCMODL is:

volA * (X - X_old)/dt = - X*Y * kf + Z *kb

In NMODL we divide the above by volA. For small values of vol? the difference doesn't matter, but for volA = 1e9 we suddenly start seeing failures to converge (to the correct solution).

I'll create a PR that adopts the battle-tested scaling factors used in NOCMODL.

Aug 13 '24 12:08 1uc

However, it's not enough to make capmp.mod work as expected.

Aug 13 '24 12:08 1uc

I think it comes down to these three things:

scaling of the equations,
the the tolerances,
one case of cancellation magic.

Scaling ODE terms:

volA * (X - X_old)/dt = - X*Y * kf + Z * kb    # NOCMODL
(X - X_old)/dt = (- X*Y * kf + Z * kb)/volA    # NMODL

Scaling CONSERVE terms:

volB*Y + volC*Z = const     # NOCMODL
Z = (volB*Y - const)/volC   # NMODL

The tolerances involved are 1e-5 with NOCMODL vs. 1e-12 for NMODL. (But since the equations are scaled different it's hard to compare the two numbers.) Note that NOCMODL uses an l1 error while NMODL uses an l2 error.

Finally, there's the issue that the CONSERVE term has very favourable cancellation properties in NOCMODL:

volB*Y + volC*Z - const

assume volB >> volC, Y ~= Z, const = volB * Y(t=0) + volC*Z(t=0). Then, we get that:

volB*Y + volC*Z == volB*Y

Hence,

(volB*Y + volC*Z) - const  == 0.0 # NOCMODL
(volB*Y - const) + volC*Z == volC*Z # NMODL

then depending on the size of volC*Z this can be large compared to the tolerance of 1e-5 (simply because volC is big, but volB much bigger).

Aug 16 '24 12:08 1uc

Note that NOCMODL uses an l1 error while NMODL uses an l2 error.

Not familiar with the l1 and l2 error terminology. What do those mean? Very sorry that you needed to get so deeply into these numerical error analysis weeds.

Aug 16 '24 13:08 nrnhines

Using numpy notation:

l1_err = np.sum(np.abs(x))
l2_err = np.sum(x**2)**0.5

(it's unlikely a really important part of the discussion.)

Might as well make it a complete answer: Mathematicians like to unify, here it's the exponent one can replace with a symbol p (is the favourite):

lp_error = np.sum(np.abs(x)**p)**(1.0/p)

With a little abuse of notation one can allow p = inf:

linf_error = np.max(np.abs(x))

Aug 16 '24 13:08 1uc

Since we anyway have access to the Jacobian, we can (somewhat) inexpensively do Newton error propagation: https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Simplification

The idea is that it should give us an estimate of how big we'd expect round-off errors to be. Once we know that we can pick a tolerance, e.g. 100.0 "newton errors". The good thing is that the estimate is per element of F. We simply require each element of F to have converged.

Aug 16 '24 14:08 1uc

It requires two fixes:

Set dt = 1e9 before computing the steady state.
Use Newton error propagation to estimate how large roundoff error in F are.

If we do both we get the plot below. (I only tested this, and the minipmp.mod (posted above).)

2024-08-16-172843_491x765_scrot

Aug 16 '24 15:08 1uc

Looks good. (I wonder where my above Assertion `false && "Newton solver did not converge!"'came from in response to oc>dt=1e9? But that's just idle curiosity.)

Aug 16 '24 15:08 nrnhines

I don't see a segfault, only a very long error message leading to an assert followed by a core dump:

special: /home/hines/neuron/temp/share/examples/nrniv/nmodl/x86_64/capmp.cpp:824: void neuron::nrn_init_capr(const {anonymous}::_nrn_model_sorted_token&, NrnThread*, Memb_list*, int): Assertion `false && "Newton solver did not converge!"' failed.

I'm happy to follow up on the SEGFAULT, if it doesn't go away after this has been made clean and merged.

Aug 16 '24 15:08 1uc

Sorry. Changed the comment. There was no segfault.

Aug 16 '24 15:08 nrnhines

That's plausibly explained by something I didn't mention in the summary. One more difference is how F scales in dt:

(X - X_old)/dt = dX    # NOCMODL
X = X_old + dt*dX      # NMODL

meaning the residual would grow linearly with dt. Hence, it would make sense that it starts failing if one pushes dt to very large values.

Aug 19 '24 07:08 1uc

The changes have been implemented here: #1403.

Aug 19 '24 11:08 1uc