nrn
nrn copied to clipboard
subsequent `setup_transfer -> stdinit` produces different results
Seems like after #1051 refactoring we are hitting the following error in the BBP simulation stack.
Failing test: neocortex-scx-v5-gapjunctions
:
62 special: ParallelContext.setup_transfer() needs to be called.
62 near line 0
62 {Cell[2].soma TPointList[20794].append(0.5)} ^
62 finitialize(-65)
62 init()
62 stdinit()
MPT ERROR: Rank 62(g:62) is aborting with error code -1.
Process ID: 44289, Host: r1i7n10, Program: /jenkins/06/workspace/hpc.SimulationStack/BUILD_HOME/spack/opt/spack/linux-rhel7-x86_64/intel-19.1.2.254/neurodamus-neocortex-develop-5dntmn/bin/special
MPT Version: HPE HMPT 2.22 03/31/20 16:17:35
Experiencing this issue with
-
master
-
8.0-cherries
(going into neuron future 8.0 branch)
The simulation passes if I revert #1051 in branch:
-
8.0-bitter-cherries
Does the test actually call setup_transfer after the gap information (source_var, target_var) has been setup?
Does the test actually call setup_transfer after the gap information (source_var, target_var) has been setup?
setup_transfer
is called but it's bit later in the python version of neurodamus i.e. before stdinit. If I call it right after source_var, target_var
has been setup then it's working.
For my understanding - does setup_transfer()
needs to be called right after gap info is setup? what are requirements?
The intention is for it to be required after the last source_var/target_var and before finitialize
. If there is inadvertent extra dependence on it, I'd like to know
@ferdonline can tomorrow confirm if there is any bug on our side in neurodamus but at least the behaviour is changed after #1051.
This is speculative. The error message was generated because is_setup_
was checked in void thread_transfer
and was false. On launch is_setup_
is initialized to false and is set to false on a call to source_var
and target_var
.
is_setup_
is set to true on a call to setup_transfer. The thing that is new in this pull request is the activation of
nrnthread_v_transfer_ = thread_transfer; // otherwise can't check is_setup_
whenever source_var
or target_var
is called. So my speculation is that neurodamus called something (finitialize
) that called nrnthread_v_transfer_
before setup_transfer
and got away with it prior to the merge of this pull request, but it didn't matter because it called finitialize
again after setup_transfer
.
I think I understood the issue with neurodamus-py. Indeed we didn't have a setup_transfer
before stdinit
when dumping gid info.
However, and this is a bit worrying, if we do setup_transfer -> stdinit
followed by another setup_transfer -> stdinit
results are different.
Any hint on that/way to fix?
Too ambiguous for me. What results (of a run or of the dumped gid info?)
The run results (reports) are different. I also haven't looked deeply, I am just raising awareness and asking for initial thoughts.
Any extra calls to setup_transfer should not affect results.
Anyway, that shouldn't hit us in neurodamus since we changed it to have a single set of calls to setup_transfer -> stdinit
.
However, it seems setup_transfer -> stdinit
-> dump info -> setup_transfer -> stdinit
-> run sim produced different results than setup_transfer -> stdinit
-> run sim.
I will try to isolate the case then.
I'd suspect that a single stdinit() is not sufficient to reach a fixed point for initialization. The setup_transfer may be a red-herring. Check first for identical results for one and two calls to stdinit() and then a psolve.
Discussed offline with @ferdonline, this is not blocking for 8.0.