CoreNeuron icon indicating copy to clipboard operation
CoreNeuron copied to clipboard

Added hybrid MPI+OpenMP test in CI

Open iomaganaris opened this issue 4 years ago • 7 comments

  • Added nrntraub test and run it with 4 ranks and 9 threads on BB5 with the SoA CoreNEURON build
  • Uses https://github.com/iomaganaris/nrntraub/tree/icei which creates the coredat by default in the NEURON run
  • Closes https://github.com/BlueBrain/CoreNeuron/issues/292

@pramodk I didn't manage to run NEURON with the threading option still with nrntraub. If you think that this should be tested maybe we can have a look together at some point. Also, let me know if I should create a PR for my fork of nrntraub

iomaganaris avatar Apr 29 '20 17:04 iomaganaris

I think we should do that. Can you put here instructions with error message and tag Michale here?

pramodk avatar Apr 30 '20 05:04 pramodk

Hello @nrnhines We were trying to run the nrntraub test from https://github.com/pramodk/nrntraub/tree/icei with threading enabled in NEURON to launch CoreNEURON from NEURON and test OpenMP. After cloning the repo I did the following:

nrnivmodl mod
srun -n 1 ./x86_64/special -c nthread=9 -mpi -c mytstop=100 -c use_coreneuron=0 init.hoc

Note that I am using 1 rank because pc.nthread gets set only if pc.nhost == 1 and I am setting use_coreneuron=0 for debugging in this case. With use_coreneuron=1 there is the same issue. And I get the following error:

...
SetupTime: 4.8000002
mytstop  100
/gpfs/bbp.cscs.ch/project/proj16/magkanar/spack/software/install/linux-rhel7-x86_64/intel-19.0.4.243/neuron-develop-3csnze/x86_64/bin/nrniv: usable mindelay is 0 (or less than dt for fixed step method)
 in init.hoc near line 65
 prun()
       ^
        finitialize(-70)
      init()
    stdinit()
  prun()

I figured out that the issue comes from calling stdinit() from prun() in hoc/parlib.hoc. I am using NEURON master and Intel compiler. Could you help us with this issue? Thank you very much in advance!

iomaganaris avatar Apr 30 '20 08:04 iomaganaris

If you are using threads you cannot have any NetCon.delay = 0. (or less than dt). Of the 109982 NetCon, 265 of them have a delay of 0. Just to see if that is the problem try again with

diff --git a/hoc/parlib2.hoc b/hoc/parlib2.hoc
index d9eb164..1fbdee3 100755
--- a/hoc/parlib2.hoc
+++ b/hoc/parlib2.hoc
@@ -50,7 +50,7 @@ proc par_netstim_create() {local gid  localobj cell, syn, nc, ns, r
                netstims.append(ns)
                nc = new NetCon(ns.pp, syn)
                netstim_netcons.append(nc)
-               nc.delay = 0
+               nc.delay = 1
                r = new Random()
                r.negexp(1)
 //             r.Isaac64(netstim_random_seedoffset + netstim_base_)

For mpi and nthread=1 i is generally ok to have NetCon.delay=0 but only if they are not interprocessor NetCon (ie. source and target must be on same process).

nrnhines avatar Apr 30 '20 12:04 nrnhines

By the way, I noticed another problem when launching python from within the nrntraub repository.

hines@hines-T7500:~/models/nrntraub-icei$ python
Python 3.7.6 (default, Feb 17 2020, 15:09:28) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import neuron
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hines/neuron/nrncmake/build/install/lib/python/neuron/__init__.py", line 132, in <module>
    import nrn
ModuleNotFoundError: No module named 'nrn'
>>> 

This seems to be an artifact of having a 'hoc' folder in the repository.

nrnhines avatar Apr 30 '20 12:04 nrnhines

I got some time to work again on this test. Thank you very much for your suggestion @nrnhines to set nc.delay = 1. NEURON and CoreNEURON with threading worked with this. I get however the following issues with threading enabled. First, NEURON generates different spikes when the simulation runs with more that one thread and more than one mpi rank than when running the simulation with 1 mpi rank and multiple threads or multiple mpi ranks and no threading. For example:

bash-4.2$ srun -n 1 ./x86_64/special -mpi -c use_coreneuron=0 -c nthread=36 -c mytstop=100 init.hoc
bash-4.2$ srun -n 4 ./x86_64/special -mpi -c use_coreneuron=0 -c nthread=9 -c mytstop=100 init.hoc
bash-4.2$ sort -n -k'1,1' -k2 < out1.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out1.sorted
bash-4.2$ sort -n -k'1,1' -k2 < out4.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out4.sorted
bash-4.2$ sdiff -s out1.sorted out4.sorted
10.375  186                                                   <
                                                              > 10.400  186
                                                              > 11.125  199
11.150  199                                                   <
                                                              > 12.950  220
12.975  220                                                   <
                                                              > 13.000  188
13.025  188                                                   <
13.025  264                                                   | 13.050  264
                                                              > 13.525  102
13.550  102                                                   <
13.675  288                                                   <
                                                              > 13.700  288
                                                              > 13.925  323
13.950  323                                                   <
14.275  312                                                   <
                                                              > 14.300  312
                                                              > 14.300  318
14.325  318                                                   | 14.350  87
14.350  192                                                   <
14.375  87                                                    <
...

During the first timesteps the spikes are the same but then there are these differences in the timesteps that the spikes are generated. In most cases the generated spikes differ by 1 timestep. Running NEURON with 36 MPI ranks and 1 thread generates the same spikes with 1 MPI rank and 36 threads. The other issue is with the spikes generated by CoreNEURON. In all of the above cases CoreNEURON generates the same spikes with NEURON in the beginning but then after a timestep spikes start to shift in time. For example:

bash-4.2$ srun -n 4 ./x86_64/special -mpi -c use_coreneuron=1 -c nthread=9 -c mytstop=100 init.hoc
bash-4.2$ sort -n -k'1,1' -k2 < out.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out4.cn.sorted
bash-4.2$ sdiff -s out4.sorted out4.cn.sorted
bash-4.2$ sdiff -s out4.sorted out4.cn.sorted | more
                                                              > 5.900   160
                                                              > 6.050   176
                                                              > 6.050   180
6.750   160                                                   <
6.750   176                                                   <
6.825   180                                                   <
6.925   188                                                   <
                                                              > 6.950   188
6.975   168                                                   <
                                                              > 7.000   168
                                                              > 7.375   287
7.400   287                                                   <
                                                              > 7.550   290
7.575   290                                                   <
...

I am using my fork of nrntraub and the branch icei from here which includes the change in the delay and allows the selection of the number of threads when more than 1 MPI ranks are used. Are the issues mentioned before related to the thread implementation or there is something going on with the test? Any help would be greatly appreciated.

Thank you very much, Ioannis

iomaganaris avatar May 19 '20 15:05 iomaganaris

@nrnhines : Similar to olfactory bulb model, do you think the above described issue might be with the model itself? In that case I will go ahead and use whatever baseline model provide with X mpi ranks and Y threads per mpi thread.

pramodk avatar Aug 16 '20 18:08 pramodk

Discrepancies between NEURON and CoreNEURON in this situation are presumptively bugs. I assume there is no intra-NEURON or intra CoreNEURON differences on this time scale with different nhost and nthread.

nrnhines avatar Aug 16 '20 18:08 nrnhines