CoreNeuron icon indicating copy to clipboard operation
CoreNeuron copied to clipboard

MPI Simulation fails when the --multisend flag is enabled

Open HolyLow opened this issue 4 years ago • 33 comments

Describe the issue I am trying to enable the multisend option of spike exchanging. However, when I tried to run the simulation as :

mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu --multisend

My program failed, while the command without "--multisend" could run smoothly. I am not sure if there is something wrong with my environment or some option that I failed to enable, or the code had some bug.

To Reproduce, and the corresponding Logs Steps to reproduce the behavior:

  1. If I don't enable the "--multisend" option, the program runs smoothly, and the "normal" log is as below:
$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[36744,1],7]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
[dgxone:76177] [[36744,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 216.2031, Min 215.6484, Avg 215.9121
 Memory (MBs) :            After MPI_Init : Max 216.2031, Min 215.6484, Avg 215.9395
 Memory (MBs) :          Before nrn_setup : Max 217.0234, Min 216.5938, Avg 216.8501
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[dgxone:76177] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:76177] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:76177] 6 more processes have sent help message help-btl-vader.txt / cma-permission-denied
 Setup Done   : 8.70 seconds
 Memory (MBs) :          After nrn_setup  : Max 1550.6797, Min 1519.4180, Avg 1538.5410
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000

GPU
--nwarp=0
--cell-permute=0

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/10000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 1550.6797, Min 1519.4180, Avg 1538.5410
 Memory (MBs) :     After nrn_finitialize : Max 1551.0352, Min 1519.8086, Avg 1538.8823

 psolve |========================================================| t: 1000.00 ETA: 0h02m44s

Solver Time : 164.163


 Simulation Statistics
 Number of cells: 10000
 Number of compartments: 2494952
 Number of presyns: 4689782
 Number of input presyns: 69968
 Number of synapses: 11968052
 Number of point processes: 16710070
 Number of transfer (gap) targets: 0
 Number of spikes: 220340
 Number of spikes with non negative gid-s: 220340

However, if the "--multisend" flag is appended, the program fails quickly with the log below:

$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[35160,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 216.1602, Min 215.7266, Avg 215.9482
 Memory (MBs) :            After MPI_Init : Max 216.1602, Min 215.7266, Avg 215.9771
 Memory (MBs) :          Before nrn_setup : Max 217.0703, Min 216.7070, Avg 216.8882
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=17523 total=371.438 time=0.000624772
all2allv_int gidout space=2532 total=371.438 time=0.000354822
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[35160,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[dgxone:12096] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:12096] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:12096] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied

Expected behavior As the non-multisend command runs well, I think there is nothing wrong with my environment. However, when the multisend flag is appended, the program fails, which is hard to understand or debug.

System (please complete the following information)

  • OS: Ubuntu 20.04
  • Compiler: PGI 20.7
  • Version: master branch
  • Backend: CPU, GPU

HolyLow avatar Jan 02 '21 11:01 HolyLow

Does the simulation work with --multisend but without --gpu?

nrnhines avatar Jan 02 '21 13:01 nrnhines

@nrnhines No, the simulation fails with --multisend but without --gpu. The log is quiet similar:

$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[38544,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
 num_mpi=8
 num_omp_thread=1


 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 216.0508, Min 215.5312, Avg 215.7192
 Memory (MBs) :            After MPI_Init : Max 216.0625, Min 215.5312, Avg 215.8867
 Memory (MBs) :          Before nrn_setup : Max 217.3984, Min 216.7070, Avg 217.1641
all2allv_int gidin to intermediate space=17523 total=371.051 time=0.000610346
all2allv_int gidout space=2532 total=371.402 time=0.00050868
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[38544,1],6]
  Exit code:    1
--------------------------------------------------------------------------
[dgxone:12424] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:12424] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:12424] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied

HolyLow avatar Jan 03 '21 01:01 HolyLow

@HolyLow : is your model or smaller test example available somewhere that we can use to reproduce the issue ?

pramodk avatar Jan 03 '21 05:01 pramodk

@pramodk All the networks I use are generated by Snudda and exported with the Neuron's nrnbbcore_write API. To meet your advice, I tested with a smaller network generated by Snudda called tinySim which consisted of 100 neurons. When I tested the same network with -np=8, the program failed:

$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/tinySim/RoundRobin-core-8 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[33977,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 216.1992, Min 215.5781, Avg 215.8984
 Memory (MBs) :            After MPI_Init : Max 216.2734, Min 215.5781, Avg 215.9272
 Memory (MBs) :          Before nrn_setup : Max 217.0977, Min 216.5352, Avg 216.8506
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=192 total=218.223 time=0.000168855
all2allv_int gidout space=58 total=218.223 time=7.7529e-05
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[33977,1],0]
  Exit code:    1
--------------------------------------------------------------------------
[dgxone:08865] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:08865] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:08865] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied

However, when I tested the small network with -np=2, the program succeeded:

$ mpiexec -np 2 ./x86_64/special-core --tstop 1000 --datpath ./networks/tinySim/Rou
ndRobin-core-2 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[53236,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
 num_mpi=2
 num_omp_thread=1

 Info : 8 GPUs shared by 2 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 215.1992, Min 215.0039, Avg 215.1016
 Memory (MBs) :            After MPI_Init : Max 215.1992, Min 215.0039, Avg 215.1016
 Memory (MBs) :          Before nrn_setup : Max 216.1953, Min 216.0273, Avg 216.1113
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=108 total=220.668 time=1.94621e-05
all2allv_int gidout space=108 total=220.668 time=1.5568e-05
all2allv_int lists space=408 total=220.668 time=1.94809e-05
 Setup Done   : 0.31 seconds
 Memory (MBs) :          After nrn_setup  : Max 266.2656, Min 262.9492, Avg 264.6074
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000

GPU
--nwarp=0
--cell-permute=0

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/tinySim/RoundRobin-core-2
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.01375
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 266.2656, Min 262.9492, Avg 264.6074
 Memory (MBs) :     After nrn_finitialize : Max 266.5938, Min 263.4727, Avg 265.0332

[dgxone:27116] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics34s
[dgxone:27116] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:27116] 1 more process has sent help message help-btl-vader.txt / cma-permission-denied
 psolve |========================================================| t: 1000.00 ETA: 0h09m55s

Solver Time : 595.167


 Simulation Statistics
 Number of cells: 100
 Number of compartments: 25006
 Number of presyns: 47647
 Number of input presyns: 100
 Number of synapses: 55873
 Number of point processes: 103510
 Number of transfer (gap) targets: 0
 Number of spikes: 4162
 Number of spikes with non negative gid-s: 4162

I did more experiments, and it turned out that regardless of the network size, np=2 or np=4 would work fine, but np=8 would fail. With the logs above, I found that the problem might lay in the all2allv_int initialization "all2allv_int lists space=408 total=220.668 time=1.94809e-05“. But I am not able to find out anything more, as I am not familiar with the code.

HolyLow avatar Jan 03 '21 05:01 HolyLow

Ok thanks!

i am not familiar with Snudda, will take a look. But if you have commands or scripts already to export such model, you can provide here. (Just would be bit easy to reproduce)

pramodk avatar Jan 03 '21 06:01 pramodk

@pramodk The original Snudda doesn't have the functionality of exporting network to CoreNeuron, and my modified version is too dirty to share. As a workaround, I uploaded my generated tinySim data so you could download it. The additional mod files are required to run the generated tinySim, so don't forget to compile the additional mods into CoreNeuron. @pramodk Thanks a lot!! Looking forward to your further advice.

HolyLow avatar Jan 03 '21 07:01 HolyLow

if the "--multisend" flag is appended, the program fails quickly

Just a datapoint as my desktop takes 20s to simulate 20ms on 8 cores when configured with -DCMAKE_BUILD_TYPE=Debug. Without a --gpu, I experience

hines@hines-T7500:~/Downloads/tinySim$ mpiexec -np 8 ./x86_64/special-core --tstop 20 --datpath ./original-RoundRobin-core-8 --mpi --multisend`

runs successfully, generating 4 spikes

hines@hines-T7500:~/Downloads/tinySim$ cat out.dat
2.8	52
5.6	27
7.425	66
13.675	47

And I see the output

...
all2allv_int gidin to intermediate space=192 total=15.4336 time=0.000116663
all2allv_int gidout space=58 total=15.4336 time=0.000155916
all2allv_int lists space=394 total=15.4336 time=4.424e-05
 Setup Done   : 0.21 seconds 
...

If you are experiencing an error under these conditions, then it seems that we will need to reproduce more accurately your hardware/software environment.

nrnhines avatar Jan 03 '21 13:01 nrnhines

@nrnhines I tried your datapoint, and without --multisend it worked fine on my environment as well. The outputs are also identical to what you've shown. But if I add the --multisend flag, the program fails. Logs are as below:

$ mpiexec -np 8 ./x86_64/special-core --tstop 20 --datpath ./networks/tinySim/RoundRobin-core-8 --mpi --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[16699,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: dgxone

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: dgxone
--------------------------------------------------------------------------
 num_mpi=8
 num_omp_thread=1


 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 215.6602, Min 215.3125, Avg 215.5264
 Memory (MBs) :            After MPI_Init : Max 215.9609, Min 215.3125, Avg 215.6260
 Memory (MBs) :          Before nrn_setup : Max 216.6797, Min 216.3984, Avg 216.5669
all2allv_int gidin to intermediate space=192 total=217.977 time=0.000178425
all2allv_int gidout space=58 total=217.977 time=7.27859e-05
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[16699,1],5]
  Exit code:    1
--------------------------------------------------------------------------
[dgxone:59171] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:59171] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:59171] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied

HolyLow avatar Jan 03 '21 14:01 HolyLow

@nrnhines @pramodk Hello, I tried to dig into the source code, and the error seemed to come from the use_phase2_ procedure in nrnmultisend_setup.cpp. But I could hardly understand what the use_phase2_ was doing... When I disabled the use_phase2_ with "--ms-phases 1" option in the cmd, things worked fine again. I was wondering if the use_phase2_ related code was buggy.

HolyLow avatar Jan 11 '21 03:01 HolyLow

@HolyLow : thanks for an update.

Just FYI : I haven't used multi-send option often. This helps for spike exchange in specific scenario. Are you testing this option for any specific use case?

I didn't get time to look into details yet. Will try to debug with your provided dataset during this week.

pramodk avatar Jan 11 '21 14:01 pramodk

@pramodk Thanks for your attention. I was just trying to compare the behavior and performance when --multisend option was (or not) enabled. And I ran into the crash accidentally.

When digging into the code, I find another interesting performance issue, maybe not related to this issue. When I dig into deliver_net_events related node, I notice that the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) seems to be executed on CPU side, which causes update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) to transfer data from CPU to GPU. Is there any chance to put the procedure on GPU side to eliminate the redundant data transfer? It seems that mod2c generates openacc code for _net_buffer_receive function, but _net_receive function is never parallelized by openacc.( Actually I am not sure about this, because in _net_receive, the realloc_net_receive_buffer function seems to cooperate with openacc. But if this is true, why should the data be transferred again later in NetCvode::deliver_net_events?) I want to know how to put the _net_receive procedure on GPU side and eliminate the update_net_receive_buffer(nt) in NetCvode::deliver_net_events if possible.

HolyLow avatar Jan 12 '21 11:01 HolyLow

Perhaps this could be improved. I don't know. The multisend method is an interprocessor spike exchange method, which, under some circumstances, has better performance than MPI_Allgather (followed by, if there are more spikes than can fit into the allgather source buffer, an MPI_Allgatherv). These spike exchange methods are not directly involved in delivery. When spikes arrive at the cpu, those spikes are placed in an event queue to await delivery. Interprocessor spike exchange and enqueuing/dequeuing are done on the cpu. At the time of dequeuing, the spike is place in a buffer specific to a mod file and at each time step that buffer is copied to the gpu. The gpu then calls the proper NET_RECEIVE block instance for the spike.

I didn't know if it was possible to efficiently manage a priority queue on the GPU.

nrnhines avatar Jan 14 '21 13:01 nrnhines

@nrnhines Thanks for your explanation!

At the time of dequeuing, the spike is place in a buffer specific to a mod file and at each time step that buffer is copied to the gpu.

Do you mean that the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) only places the spike in a buffer specific to a mod file, and if we directly change the buffer on the gpu, the update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) which transfers data from CPU to GPU can be eliminated?

I didn't know if it was possible to efficiently manage a priority queue on the GPU.

I think if the deliver procedure is to be moved to the GPU side, then maybe we won't choose to use a priority queue. Possibly we can record all the spikes according to their arrival times, and check if the fired spikes can trigger some netcon at each dt.

HolyLow avatar Jan 19 '21 10:01 HolyLow

Do you mean

Yes. I would only disambiguate, or stress a bit more, that on dequeuing on the cpu from the destination cpu priority queue that the cpu copies the spike to a modfile specific buffer on the cpu. That buffer is copied every time step to the gpu. The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step. The presumption was that cpu->gpu transfer is better performance as an array than as one at a time. If that presumption is vitiated by the performance improvement of eliminating the

update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) which transfers data from CPU to GPU can be eliminated

Then it makes sense to do so.

Possibly we can record all the spikes according to their arrival times

I'm not sure I know exactly what you mean. There is a spike generation time ts, an arrival time on the destination node, and a delivery time, ts + NetCon.delay. The latter is what goes onto the priority queue and when that time occurs the spike is sent to the (buffered) destination for immediate (that time step) delivery.

check if the fired spikes can trigger some netcon at each dt.

That seems to imply some sort of queue or insertable, removable list on the gpu which is either global, per mod type, or per mod instance. generation order, is not correlated with arrival on node order, which is not correlated with delivery order. (latter due to different NetCon.delay) (although in practice delivery is in the same order as generation due to globally constant NetCon.delay).

nrnhines avatar Jan 19 '21 12:01 nrnhines

@nrnhines I really appreciate your patience!! I have some more questions:

the cpu copies the spike to a modfile specific buffer on the cpu.

Do you refer to the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) ? I failed to find out the data structure of the "modfile specific buffer".

That buffer is copied every time step to the gpu.

Is the copy carried out in update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) ? As I am not able to find the modfile specific buffer on CPU, I failed to understand this procedure as well...

The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step.

Where does this procedure happen? In update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) ?

I'm not sure I know exactly what you mean. There is a spike generation time ts, an arrival time on the destination node, and a delivery time, ts + NetCon.delay.

I mean the spike generation time ts, which is transferred in MPI_Spike structure. I think maybe we could record all the spikes in a table according to their generation time ts, and then we check the table each time we want to "pop" netcons. We could directly check all the spike's netcons, calculate spike.ts + netcon.delay to see if the netcon should be activated. As the netcons have a maximum delay, the spikes would be outdated after the maximum delay and can be safely discarded, which keeps the number of tracked spikes small enough. A round-robin queue might be just ok.

Actually, I have another question for the existing priority queue. It seems that all kinds of DiscreteEvents are inserted into the queue, such as NetCons, SelfEvents, ConditionEvents, NetParEvents, etc. Currently I only know that in deliver_net_events related node, the netcons would be inserted into the queue. But what about other kinds of events, including SelfEvents, ConditionEvents and NetParEvents? Where are they produced and inserted into the priority queue?

HolyLow avatar Jan 20 '21 10:01 HolyLow

the cpu copies the spike to a modfile specific buffer on the cpu. That buffer is copied every time step to the gpu. The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step.

void NetCvode::deliver_net_events(NrnThread* nt) {  // for default method
    ...
    deliver_events(tm, nt);   Though need to deal with interthread events, the principle call here is for each event to call 
                                           deliver_event which calls (for our purposes) NetCon.deliver.
                                           That calls (*corenrn.get_pnt_receive()[typ]) which is in the the nmodl translated cxx file with the function
                                            name (e.g. in build/x86_64/corenrn/mod2c/expsyn.cpp) _net_buf_receive which appends the
                                            spike to the membrane_list specific _net_receive_buffer (on the cpu)
    ...
    /*before executing on gpu, we have to update the NetReceiveBuffer_t on GPU */
    update_net_receive_buffer(nt);

    for (auto& net_buf_receive : corenrn.get_net_buf_receive()) {
        (*net_buf_receive.first)(nt);  This calls the  _net_buf_receive function in the proper cxx translated mod file where,
                                                     on the GPU, the events in the _ml->_net_receive_buffer are looped over and
                                                     net_receive_kernel is called for each of then.

I think maybe we could record all the spikes in a table according to their generation time ts, and then we check the table each time we want to "pop" netcons.

That is very similar to the binqueue for NetCon events except td is in the binqueue so that spike.ts+netcon.delay is calculated only once.

what about other kinds of events

They are never interprocessor or interthread events. They come from a mod file instance and get sent back to the same instance. See netcvode.cpp:: void net_send(...

nrnhines avatar Jan 20 '21 13:01 nrnhines

@nrnhines

That is very similar to the binqueue for NetCon events except td is in the binqueue so that spike.ts+netcon.delay is calculated only once.

Wow, I haven't read the code about binqueue, and I think this query method would be good for GPU parallelization rather than CPU single thread. So if the query method is moved to GPU, the performance should increase greatly.

They are never interprocessor or interthread events. They come from a mod file instance and get sent back to the same instance. See netcvode.cpp:: void net_send(...

Do you mean that in the priority queue there are only netcons, and other events such as SelfEvents, ConditionEvents, and NetParEvents won't appear in the queue?

Besides, if I want to let the translated cxx file's _net_buf_receive directly modify the gpu version _net_receive_buffer to avoid the update_net_receive_buffer(nt) (which is strangely slow in my profiling...), what should I do? Currently, the _net_receive_buffer is pinned between CPU and GPU because of acc_copyin, and I guess if the size of the buffer components changes, during the acc_update_device the gpu buffer might be freed, reallocated and transferred again, which greatly hurts performance.

HolyLow avatar Jan 20 '21 13:01 HolyLow

...binqueue... if the query method is moved to GPU, the performance should increase greatly.

I agree it could be on the GPU. Whether you can get better performance (at least for interprocessor spikes) is an experimental question. But I wouldn't be surprised if a single cpu thread model on the GPU could greatly benefit with a spike staying on the GPU from send to receive. That was beyond the scope of my GPU understanding since spikes are generated and delivered randomly without regard to ordering related to the otherwise "structure of array" memory organization.

SelfEvents, ConditionEvents, and NetParEvents won't appear in the queue?

The don't appear in the binqueue. They go into the heap or splay tree queue for exact delivery time. These really are candidates for staying on the GPU (at least SelfEvents and ConditionEvents that get sent and delivered to the same object. NetParEvents are just for synchronization at "minimum NetCon (interprocessor/interthread) delay integration intervals" to ensure every event has arrived at its thread destination in time for delivery. The NEURON version has a flag for "SelfEvent not on queue" that takes advantage of this but it did not make it into CoreNEURON.

(which is strangely slow in my profiling...), what should I do? ... guess if the size of the buffer components changes

You would have to establish whether buffer size changes are a significant performance issue. Presently the ml->_net_receive_buffer is doubled in size every reallocation and starts out with size equal to the number of instances of the type. The usual pattern is that the number of NetCon events destined for an instance within a time step is usually 0 and never > 1. Full spike synchrony would just be enough to fill it up. We should consult with @pramodk about your performance results.

nrnhines avatar Jan 20 '21 15:01 nrnhines

@nrnhines I've done some more detailed profiling, and it suggests that you are right about the GPU buffer allocations. The GPU buffers are not reallocated, so the performance is not influenced by it.

The more detailed profiling suggests that the (*corenrn.get_pnt_receive()[typ]) called by the delivered netcons and the update_net_receive_buffer(nt) are consuming a great partial of time when the simulation size increases.

I think If we could collect all the netcons that is going to fire in a timestep, and let them fire (aka calling the *corenrn.get_pnt_receive()[typ]) in parallel on GPU, and eliminate the data transfer of update_net_receive_buffer(nt), then the performance should be improved. But I have no idea how to make it, as the mod files are auto-generated. Do I have to modify the generated mod c codes on my self? Could you give me some advices on how to realize it?

I have another question on the deliver event functions. It appears that there are two functions, deliver_net_events and nrn_deliver_events, that are related to the event delivery. But when I dig into the code, only the deliver_net_events function would call (input)presyn->send and insert the netcons into the priority queue. So what does nrn_deliver_events do? Why is it seperated from the deliver_net_events?

HolyLow avatar Jan 22 '21 07:01 HolyLow

@nrnhines : could you copy / attach one of your profile output and also coreneuron's stdout? I am curious because in other models I haven't seen event related parts being bottleneck.

pramodk avatar Jan 22 '21 08:01 pramodk

@pramodk I ran a huge network with 100000 neurons, on 8 top-tired A100 GPUs. Here is my output log:

$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu --multisend --ms-phases 1
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[20140,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: onebrain-dgx-a100-01

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 202.8984, Min 202.4102, Avg 202.5913
 Memory (MBs) :            After MPI_Init : Max 202.8984, Min 202.4102, Avg 202.5913
 Memory (MBs) :          Before nrn_setup : Max 204.4688, Min 204.0898, Avg 204.2632
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2475650] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2475650] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2475650] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
all2allv_int gidin to intermediate space=175014 total=1833.57 time=0.00667806
all2allv_int gidout space=25034 total=1836.1 time=0.000639373
all2allv_int lists space=225026 total=1836.9 time=0.00040563
 Setup Done   : 52.94 seconds
 Memory (MBs) :          After nrn_setup  : Max 13454.4922, Min 13310.5234, Avg 13362.8823
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000

GPU
--nwarp=0
--cell-permute=0

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=1
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 13454.4922, Min 13310.5234, Avg 13362.8823
 Memory (MBs) :     After nrn_finitialize : Max 13454.8281, Min 13310.8945, Avg 13363.2485

 psolve |========================================================| t: 1000.00 ETA: 0h14m08s

Solver Time : 848.615


 Simulation Statistics
 Number of cells: 100001
 Number of compartments: 24948985
 Number of presyns: 46885564
 Number of input presyns: 699903
 Number of synapses: 142227989
 Number of point processes: 189819298
 Number of transfer (gap) targets: 0
 Number of spikes: 1705882
 Number of spikes with non negative gid-s: 1705882
Path                                            Min time/rank Max time/rank Avg time/rank Time %
main                                               910.918379    910.971648    910.943178 99.999924
  checkpoint                                         0.000014      0.000017      0.000016  0.000002
  output-spike                                       0.181329      0.181383      0.181364  0.019909
  simulation                                       848.614906    848.614913    848.614910 93.157761
    nrn_multisend_receive                           48.211981     77.066952     65.366869  7.175730
      nrnmpi_multisend_advance                       9.009972      9.801592      9.344053  1.025755
    timestep                                       771.164518    799.889700    782.842262 85.937486
      state-update                                  21.478632     23.407606     22.235478  2.440927
        state-tmGlut                                 0.600395      0.657703      0.621035  0.068175
        state-tmGabaA                                0.603162      0.670587      0.633518  0.069545
        state-sk_ms                                  0.630907      0.675192      0.649857  0.071339
        state-gGapPar                                0.198618      0.209459      0.204231  0.022420
        state-naf_ms                                 0.624028      0.681327      0.646556  0.070977
        state-naf_fs                                 0.621885      0.693116      0.653311  0.071718
        state-kir_ms                                 0.615571      0.683735      0.639699  0.070224
        state-kir_fs                                 0.621746      0.702599      0.645035  0.070810
        state-kdr_ms                                 0.605290      0.691195      0.639563  0.070209
        state-kdr_fs                                 0.609506      0.685924      0.640477  0.070309
        state-kas_ms                                 0.612295      0.683557      0.642188  0.070497
        state-kas_fs                                 0.621509      0.713599      0.654592  0.071859
        state-kaf_ms                                 0.617341      0.693093      0.644476  0.070748
        state-kaf_fs                                 0.624762      0.703548      0.653654  0.071756
        state-cat33_ms                               0.622820      0.670690      0.647910  0.071125
        state-cat32_ms                               0.623263      0.677978      0.644940  0.070799
        state-car_ms                                 0.620437      0.680921      0.649741  0.071326
        state-caq_fs                                 0.624777      0.685835      0.651047  0.071470
        state-can_ms                                 0.641906      0.691721      0.663833  0.072873
        state-cal13_ms                               0.630679      0.674601      0.650541  0.071414
        state-cal12_ms                               0.642529      0.690433      0.662547  0.072732
        state-bk_ms                                  0.634718      0.687286      0.658555  0.072294
        state-bk_fs                                  0.641710      0.709077      0.662374  0.072713
        state-cadyn_fs                               0.624075      0.685532      0.655934  0.072006
        state-cadyn_ms                               0.635891      0.710540      0.666689  0.073187
        state-caldyn_ms                              0.670242      0.715727      0.693189  0.076096
        state-pas                                    0.230345      0.246662      0.236413  0.025953
      update                                         0.761722      0.831632      0.800757  0.087904
      second_order_cur                               0.206082      0.237223      0.217908  0.023921
      matrix-solver                                260.285747    263.816372    262.734427 28.842000
      setup_tree_matrix                             27.783332     31.105136     29.111301  3.195729
        cur-tmGlut                                   0.698753      0.778214      0.734702  0.080653
        cur-tmGabaA                                  0.701016      0.771259      0.731793  0.080334
        cur-sk_ms                                    0.640352      0.731186      0.676640  0.074279
        cur-gGapPar                                  0.712011      0.801755      0.743627  0.081633
        cur-naf_ms                                   0.642256      0.736338      0.681916  0.074858
        cur-naf_fs                                   0.651287      0.712772      0.683288  0.075009
        cur-kir_ms                                   0.647686      0.754648      0.679597  0.074604
        cur-kir_fs                                   0.665318      0.731789      0.693071  0.076083
        cur-kdr_ms                                   0.636183      0.731089      0.673716  0.073958
        cur-kdr_fs                                   0.642821      0.718732      0.678598  0.074494
        cur-kas_ms                                   0.669113      0.766549      0.703603  0.077239
        cur-kas_fs                                   0.654087      0.745909      0.696810  0.076493
        cur-kaf_ms                                   0.653428      0.722170      0.686797  0.075394
        cur-kaf_fs                                   0.670647      0.759065      0.707676  0.077686
        cur-cat33_ms                                 0.660693      0.763976      0.701601  0.077019
        cur-cat32_ms                                 0.668792      0.768218      0.705791  0.077479
        cur-car_ms                                   0.665881      0.754181      0.700573  0.076906
        cur-caq_fs                                   0.670234      0.766199      0.715233  0.078516
        cur-can_ms                                   0.672516      0.767212      0.710777  0.078026
        cur-cal13_ms                                 0.665043      0.752573      0.706560  0.077564
        cur-cal12_ms                                 0.671335      0.782261      0.713954  0.078375
        cur-bk_ms                                    0.653320      0.735494      0.688944  0.075630
        cur-bk_fs                                    0.658014      0.748753      0.699011  0.076735
        cur-cadyn_fs                                 0.670311      0.765523      0.717275  0.078740
        cur-cadyn_ms                                 0.653098      0.745702      0.694606  0.076251
        cur-caldyn_ms                                0.658266      0.750415      0.696119  0.076417
        cur-cal_ion                                  0.556998      0.647780      0.587886  0.064536
        cur-ca_ion                                   0.563171      0.632140      0.595445  0.065366
        cur-k_ion                                    0.574322      0.640588      0.602244  0.066112
        cur-na_ion                                   0.648642      0.746462      0.688948  0.075630
        cur-pas                                      0.678330      0.787698      0.719481  0.078982
      deliver_events                               450.160572    473.903018    460.721194 50.576244
        nrn_deliver_events                         132.393860    135.454351    134.115292 14.722674
          netbuf_receive_device                      2.765391      2.927384      2.830566  0.310729
          transfer_netbuf_host2device               50.994765     58.777755     55.119894  6.050855
            update_net_receive_buffer               50.215810     58.042484     54.365246  5.968013
              acc_update_device                     24.804827     33.915664     29.601895  3.249585
              net_receive_buffer_order_refactor     22.872733     24.352687     23.540262  2.584162
          cvode_instance_deliver_events             72.225182     77.753242     74.663350  8.196262
        deliver_net_events                         313.929017    337.217062    325.070261 35.684993
          netbuf_receive_device                      2.750386      2.912784      2.820341  0.309607
          transfer_netbuf_host2device               34.304229     36.817370     35.225450  3.866918
            update_net_receive_buffer               33.573392     35.991656     34.462581  3.783173
              acc_update_device                      9.373999      9.934178      9.533084  1.046506
              net_receive_buffer_order_refactor     23.000350     24.758430     23.701044  2.601812
          deque_deliver_host                        72.178038     77.727124     74.609715  8.190375
          nrn_multisend_advance_host               167.421934    179.408870    173.218302 19.015255
            nrnmpi_multisend_advance                 0.931094      1.120856      1.029664  0.113033
          get_watch_host                             0.198831      0.225281      0.209353  0.022982
          send_presyn_host                          31.480995     35.580274     33.327281  3.658544
            nrnmpi_multisend                         3.652442      4.098563      3.879833  0.425914
          transfer_spike_deveice2host                0.948560      1.023150      0.972604  0.106769
          collect_spike_device                       1.901445      2.134007      2.011170  0.220779
  finitialize                                        6.514536      6.514597      6.514570  0.715145
    nrn_multisend_receive                            0.000184      0.437879      0.237697  0.026093
      nrnmpi_multisend_advance                       0.000047      0.000101      0.000067  0.000007
    cur-tmGlut                                       0.000021      0.000028      0.000023  0.000003
    cur-tmGabaA                                      0.000020      0.000035      0.000024  0.000003
    cur-sk_ms                                        0.000018      0.000025      0.000021  0.000002
    cur-gGapPar                                      0.000019      0.000035      0.000023  0.000002
    cur-naf_ms                                       0.000019      0.000024      0.000021  0.000002
    cur-naf_fs                                       0.000019      0.000025      0.000021  0.000002
    cur-kir_ms                                       0.000019      0.000025      0.000021  0.000002
    cur-kir_fs                                       0.000019      0.000026      0.000022  0.000002
    cur-kdr_ms                                       0.000017      0.000025      0.000021  0.000002
    cur-kdr_fs                                       0.000018      0.000025      0.000021  0.000002
    cur-kas_ms                                       0.000019      0.000027      0.000022  0.000002
    cur-kas_fs                                       0.000019      0.000026      0.000022  0.000002
    cur-kaf_ms                                       0.000019      0.000024      0.000021  0.000002
    cur-kaf_fs                                       0.000020      0.000026      0.000022  0.000002
    cur-cat33_ms                                     0.000019      0.000024      0.000021  0.000002
    cur-cat32_ms                                     0.000020      0.000025      0.000022  0.000002
    cur-car_ms                                       0.000019      0.000025      0.000022  0.000002
    cur-caq_fs                                       0.000020      0.000027      0.000023  0.000002
    cur-can_ms                                       0.000019      0.000026      0.000022  0.000002
    cur-cal13_ms                                     0.000019      0.000024      0.000021  0.000002
    cur-cal12_ms                                     0.000020      0.000026      0.000022  0.000002
    cur-bk_ms                                        0.000018      0.000024      0.000020  0.000002
    cur-bk_fs                                        0.000020      0.000025      0.000021  0.000002
    cur-cadyn_fs                                     0.000018      0.000029      0.000021  0.000002
    cur-cadyn_ms                                     0.000017      0.000027      0.000020  0.000002
    cur-caldyn_ms                                    0.000018      0.000030      0.000021  0.000002
    cur-cal_ion                                      0.000013      0.000022      0.000015  0.000002
    cur-ca_ion                                       0.000013      0.000019      0.000015  0.000002
    cur-k_ion                                        0.000014      0.000018      0.000015  0.000002
    cur-na_ion                                       0.000019      0.000024      0.000021  0.000002
    cur-pas                                          0.000025      0.000036      0.000031  0.000003
    nrn_deliver_events                               0.006263      0.014525      0.013337  0.001464
      netbuf_receive_device                          0.005979      0.014208      0.013023  0.001430
      transfer_netbuf_host2device                    0.000091      0.000114      0.000099  0.000011
        update_net_receive_buffer                    0.000033      0.000044      0.000038  0.000004
      cvode_instance_deliver_events                  0.000036      0.000047      0.000041  0.000005
  load-model                                        55.412495     55.606147     55.501959  6.092797

The profiler is Caliper, and note that I've added many detailed profiling to the deliver_events. In the above profiling, deliver_event consumes 50% of the time. "transfer_netbuf_host2device" refers to the data transfer of update_net_receive_buffer(nt). "deque_deliver_host" and"cvode_instance_deliver_events" refer to the deque and netcon send( aka *corenrn.get_pnt_receive()[typ]) procedure).

HolyLow avatar Jan 22 '21 08:01 HolyLow

@HolyLow : if this is executing on GPUs, for profiling, have you done export PGI_ACC_SYNCHRONOUS=1? If not, could you try and run again? (with and without multi-send).

The reason I am mentioning PGI_ACC_SYNCHRONOUS is that kernels will be launched asynchronously on GPU and I want to verify the elapsed times are correct.

pramodk avatar Jan 22 '21 08:01 pramodk

@pramodk I added the export PGI_ACC_SYNCHRONOUS=1 and reran the program, the log is as below:

$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu --multisend --ms-phases 1
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[3083,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: onebrain-dgx-a100-01

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 202.8594, Min 202.4688, Avg 202.6294
 Memory (MBs) :            After MPI_Init : Max 202.9570, Min 202.4688, Avg 202.6680
 Memory (MBs) :          Before nrn_setup : Max 204.6328, Min 204.1875, Avg 204.3760
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2655272] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2655272] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2655272] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
all2allv_int gidin to intermediate space=175014 total=1833.37 time=0.00298926
all2allv_int gidout space=25034 total=1836.03 time=0.00412042
all2allv_int lists space=225026 total=1836.77 time=0.000453119
 Setup Done   : 57.16 seconds
 Memory (MBs) :          After nrn_setup  : Max 13454.6875, Min 13310.2070, Avg 13362.8667
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000

GPU
--nwarp=0
--cell-permute=0

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=1
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 13454.6875, Min 13310.2070, Avg 13362.8667
 Memory (MBs) :     After nrn_finitialize : Max 13455.0352, Min 13310.6016, Avg 13363.2354

 psolve |========================================================| t: 1000.00 ETA: 0h17m08s

Solver Time : 1027.55


 Simulation Statistics
 Number of cells: 100001
 Number of compartments: 24948985
 Number of presyns: 46885564
 Number of input presyns: 699903
 Number of synapses: 142227989
 Number of point processes: 189819298
 Number of transfer (gap) targets: 0
 Number of spikes: 1706101
 Number of spikes with non negative gid-s: 1706101
Path                                            Min time/rank Max time/rank Avg time/rank Time %
main                                              1094.953983   1095.032779   1095.000754 99.999943
  checkpoint                                         0.000014      0.000017      0.000016  0.000001
  output-spike                                       0.189638      0.189893      0.189854  0.017338
  simulation                                      1027.548506   1027.548533   1027.548517 93.839929
    nrn_multisend_receive                           43.115893     84.618265     71.113138  6.494342
      nrnmpi_multisend_advance                       8.981463      9.662669      9.190748  0.839337
    timestep                                       942.551377    984.021549    956.042807 87.309735
      state-update                                 163.277901    169.919139    167.520333 15.298641
        state-tmGlut                                17.891402     20.962417     19.721948  1.801089
        state-tmGabaA                               18.806481     20.901478     19.983781  1.825000
        state-sk_ms                                  5.966395      6.327325      6.155395  0.562136
        state-gGapPar                                0.199146      0.215570      0.205517  0.018769
        state-naf_ms                                 7.553335      8.192543      7.905761  0.721986
        state-naf_fs                                 1.203253      1.267370      1.229883  0.112318
        state-kir_ms                                 5.832329      6.282179      6.091267  0.556279
        state-kir_fs                                 1.507556      1.561963      1.530904  0.139808
        state-kdr_ms                                 5.204399      5.630903      5.339970  0.487668
        state-kdr_fs                                 1.136648      1.190377      1.158424  0.105792
        state-kas_ms                                 7.536179      8.299321      8.022001  0.732602
        state-kas_fs                                 1.222709      1.280508      1.251373  0.114281
        state-kaf_ms                                 8.352539      8.974221      8.735237  0.797738
        state-kaf_fs                                 1.187475      1.266548      1.223957  0.111777
        state-cat33_ms                               8.821202      9.457032      9.085241  0.829701
        state-cat32_ms                               8.739273      9.472221      9.003772  0.822261
        state-car_ms                                 8.707610      9.329355      8.915207  0.814173
        state-caq_fs                                 1.121425      1.217344      1.151527  0.105162
        state-can_ms                                 1.254100      1.315166      1.280084  0.116902
        state-cal13_ms                               8.868501      9.517126      9.264925  0.846111
        state-cal12_ms                               8.727495      9.499393      9.158444  0.836386
        state-bk_ms                                  6.540231      6.934413      6.782218  0.619380
        state-bk_fs                                  1.211832      1.284328      1.240724  0.113308
        state-cadyn_fs                               1.191211      1.283060      1.225622  0.111929
        state-cadyn_ms                               7.900847      8.232364      8.043134  0.734532
        state-caldyn_ms                              7.898526      8.171343      7.996948  0.730314
        state-pas                                    0.230333      0.237973      0.234125  0.021381
      update                                         5.616285      5.869693      5.696664  0.520243
      second_order_cur                               0.202088      0.229119      0.213957  0.019539
      matrix-solver                                 46.581775     48.050873     46.960193  4.288597
      setup_tree_matrix                            282.201459    291.735860    284.711288 26.000998
        cur-tmGlut                                  47.820663     50.219490     49.467232  4.517550
        cur-tmGabaA                                 26.867740     27.689633     27.253236  2.488877
        cur-sk_ms                                   11.517667     11.878265     11.617802  1.060985
        cur-gGapPar                                  1.500699      1.602815      1.552103  0.141744
        cur-naf_ms                                  12.219218     13.171423     12.764611  1.165716
        cur-naf_fs                                   1.223981      1.327254      1.265339  0.115556
        cur-kir_ms                                  12.067042     12.362962     12.158961  1.110406
        cur-kir_fs                                   1.941612      2.076152      1.995893  0.182273
        cur-kdr_ms                                  10.773797     11.167160     10.854743  0.991299
        cur-kdr_fs                                   1.216900      1.307927      1.250850  0.114233
        cur-kas_ms                                  12.918723     13.412306     13.028015  1.189772
        cur-kas_fs                                   1.236897      1.341445      1.274981  0.116436
        cur-kaf_ms                                  12.817183     13.118215     12.940085  1.181741
        cur-kaf_fs                                   1.244348      1.331345      1.281097  0.116995
        cur-cat33_ms                                12.337161     12.736887     12.490677  1.140700
        cur-cat32_ms                                12.362707     12.890519     12.497404  1.141314
        cur-car_ms                                  13.216458     14.410276     13.887857  1.268296
        cur-caq_fs                                   1.301755      1.397693      1.331208  0.121571
        cur-can_ms                                   1.367863      1.483208      1.405410  0.128348
        cur-cal13_ms                                13.998148     14.284209     14.088838  1.286650
        cur-cal12_ms                                14.028517     14.450303     14.146985  1.291960
        cur-bk_ms                                   11.640999     12.098735     11.755976  1.073604
        cur-bk_fs                                    1.228959      1.315720      1.266282  0.115642
        cur-cadyn_fs                                 1.058706      1.205225      1.100279  0.100482
        cur-cadyn_ms                                 1.539039      1.666634      1.570395  0.143415
        cur-caldyn_ms                                1.555177      1.648164      1.586266  0.144864
        cur-cal_ion                                  3.846431      4.095737      3.908022  0.356896
        cur-ca_ion                                   3.827879      4.015865      3.889074  0.355166
        cur-k_ion                                    1.965004      2.066879      1.993873  0.182089
        cur-na_ion                                   1.900328      2.052253      1.946381  0.177752
        cur-pas                                      6.512316      6.742299      6.570815  0.600074
      deliver_events                               437.413407    464.659278    447.529065 40.870183
        nrn_deliver_events                         113.639760    120.781768    116.346337 10.625223
          netbuf_receive_device                      3.304050      3.514258      3.367622  0.307545
          transfer_netbuf_host2device               34.147801     36.281306     34.893598  3.186626
            update_net_receive_buffer               33.417942     35.513446     34.147335  3.118474
              acc_update_device                      9.170103      9.595447      9.305124  0.849782
              net_receive_buffer_order_refactor     23.036433     24.625675     23.633974  2.158351
          cvode_instance_deliver_events             74.710494     79.397623     76.592057  6.994700
        deliver_net_events                         322.175748    342.262361    329.654370 30.105384
          netbuf_receive_device                      3.313029      3.599303      3.401578  0.310646
          transfer_netbuf_host2device               34.341575     36.602362     35.124844  3.207744
            update_net_receive_buffer               33.607144     35.808834     34.370025  3.138811
              acc_update_device                      9.299829      9.847114      9.466439  0.864514
              net_receive_buffer_order_refactor     23.087757     24.635328     23.676065  2.162195
          deque_deliver_host                        74.477037     79.028204     76.292833  6.967373
          nrn_multisend_advance_host               170.145610    180.988931    174.454839 15.931929
            nrnmpi_multisend_advance                 0.921635      1.179529      1.016580  0.092838
          get_watch_host                             0.198887      0.239930      0.206792  0.018885
          send_presyn_host                          32.147317     35.482287     33.679884  3.075785
            nrnmpi_multisend                         3.508115      3.931272      3.735521  0.341143
          transfer_spike_deveice2host                1.053940      1.135249      1.074997  0.098173
          collect_spike_device                       2.621060      2.969821      2.750464  0.251184
  finitialize                                        6.892855      6.892969      6.892919  0.629490
    nrn_multisend_receive                            0.000319      0.593271      0.447595  0.040876
      nrnmpi_multisend_advance                       0.000046      0.000139      0.000071  0.000007
    cur-tmGlut                                       0.001347      0.001420      0.001402  0.000128
    cur-tmGabaA                                      0.000745      0.000762      0.000753  0.000069
    cur-sk_ms                                        0.000295      0.000306      0.000299  0.000027
    cur-gGapPar                                      0.000041      0.000047      0.000042  0.000004
    cur-naf_ms                                       0.000311      0.000336      0.000327  0.000030
    cur-naf_fs                                       0.000034      0.000040      0.000036  0.000003
    cur-kir_ms                                       0.000310      0.000319      0.000314  0.000029
    cur-kir_fs                                       0.000052      0.000059      0.000054  0.000005
    cur-kdr_ms                                       0.000278      0.000285      0.000280  0.000026
    cur-kdr_fs                                       0.000033      0.000042      0.000036  0.000003
    cur-kas_ms                                       0.000333      0.000345      0.000337  0.000031
    cur-kas_fs                                       0.000035      0.000044      0.000037  0.000003
    cur-kaf_ms                                       0.000331      0.000338      0.000334  0.000031
    cur-kaf_fs                                       0.000034      0.000041      0.000037  0.000003
    cur-cat33_ms                                     0.000318      0.000331      0.000325  0.000030
    cur-cat32_ms                                     0.000319      0.000329      0.000325  0.000030
    cur-car_ms                                       0.000342      0.000366      0.000361  0.000033
    cur-caq_fs                                       0.000036      0.000041      0.000038  0.000003
    cur-can_ms                                       0.000038      0.000044      0.000040  0.000004
    cur-cal13_ms                                     0.000364      0.000373      0.000368  0.000034
    cur-cal12_ms                                     0.000366      0.000373      0.000369  0.000034
    cur-bk_ms                                        0.000298      0.000303      0.000300  0.000027
    cur-bk_fs                                        0.000034      0.000040      0.000035  0.000003
    cur-cadyn_fs                                     0.000028      0.000034      0.000030  0.000003
    cur-cadyn_ms                                     0.000042      0.000048      0.000045  0.000004
    cur-caldyn_ms                                    0.000043      0.000049      0.000045  0.000004
    cur-cal_ion                                      0.000100      0.000119      0.000104  0.000009
    cur-ca_ion                                       0.000098      0.000101      0.000099  0.000009
    cur-k_ion                                        0.000049      0.000054      0.000051  0.000005
    cur-na_ion                                       0.000051      0.000057      0.000053  0.000005
    cur-pas                                          0.000174      0.000183      0.000179  0.000016
    nrn_deliver_events                               0.000565      0.000636      0.000597  0.000055
      netbuf_receive_device                          0.000283      0.000316      0.000298  0.000027
      transfer_netbuf_host2device                    0.000090      0.000102      0.000094  0.000009
        update_net_receive_buffer                    0.000034      0.000041      0.000036  0.000003
      cvode_instance_deliver_events                  0.000037      0.000043      0.000040  0.000004
  load-model                                        60.102146     60.321097     60.198128  5.497539

In the above profiling, deliver_event consumes 40% of the time.

HolyLow avatar Jan 22 '21 09:01 HolyLow

Thanks! Could you also attach profile without multi-send please?

pramodk avatar Jan 22 '21 09:01 pramodk

@pramodk The log without multisend is as below:

$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[21864,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: onebrain-dgx-a100-01

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.

The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.

  Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
 num_mpi=8
 num_omp_thread=1

 Info : 8 GPUs shared by 8 ranks per node

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)

 Additional mechanisms from files
 bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod

 Memory (MBs) :             After mk_mech : Max 202.8984, Min 202.6172, Avg 202.7500
 Memory (MBs) :            After MPI_Init : Max 202.9766, Min 202.6602, Avg 202.8291
 Memory (MBs) :          Before nrn_setup : Max 204.6641, Min 204.2695, Avg 204.4805
 WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2743626] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2743626] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2743626] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
 Setup Done   : 57.05 seconds
 Memory (MBs) :          After nrn_setup  : Max 13452.1523, Min 13308.1914, Avg 13360.6455
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000

GPU
--nwarp=0
--cell-permute=0

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 13452.1523, Min 13308.1914, Avg 13360.6455
 Memory (MBs) :     After nrn_finitialize : Max 13452.5234, Min 13308.5508, Avg 13361.0044

 psolve |========================================================| t: 1000.00 ETA: 0h18m40s

Solver Time : 1119.6


 Simulation Statistics
 Number of cells: 100001
 Number of compartments: 24948985
 Number of presyns: 46885564
 Number of input presyns: 699903
 Number of synapses: 142227989
 Number of point processes: 189819298
 Number of transfer (gap) targets: 0
 Number of spikes: 1706357
 Number of spikes with non negative gid-s: 1706357
Path                                            Min time/rank Max time/rank Avg time/rank Time %
main                                              1187.057142   1187.094615   1187.074318 99.999948
  checkpoint                                         0.000017      0.000022      0.000020  0.000002
  output-spike                                       0.301216      0.301703      0.301620  0.025409
  simulation                                      1119.597563   1119.597584   1119.597576 94.315661
    nrn_spike_exchange_send                        218.428292    237.298587    225.377526 18.985956
    spike-exchange                                  59.924715    112.155343     93.236865  7.854337
      communication                                  0.556400      0.707461      0.645768  0.054400
      imbalance                                     59.274358    111.467468     92.570782  7.798226
    timestep                                       788.600757    821.932399    800.550250 67.438898
      state-update                                 164.618212    175.163768    170.006737 14.321483
        state-tmGlut                                17.880351     21.164150     19.842612  1.671555
        state-tmGabaA                               18.690228     21.053429     20.069056  1.690631
        state-sk_ms                                  6.000227      6.400353      6.222362  0.524176
        state-gGapPar                                0.211716      0.231105      0.219498  0.018491
        state-naf_ms                                 7.638330      8.361350      7.988473  0.672954
        state-naf_fs                                 1.256643      1.364567      1.293139  0.108935
        state-kir_ms                                 5.833877      6.311499      6.148401  0.517945
        state-kir_fs                                 1.566789      1.683058      1.618802  0.136369
        state-kdr_ms                                 5.287438      5.585141      5.397896  0.454722
        state-kdr_fs                                 1.181379      1.311535      1.230932  0.103695
        state-kas_ms                                 7.616615      8.452772      8.098884  0.682256
        state-kas_fs                                 1.282462      1.412272      1.322515  0.111410
        state-kaf_ms                                 8.431140      9.091935      8.805247  0.741760
        state-kaf_fs                                 1.244079      1.366700      1.289862  0.108659
        state-cat33_ms                               8.925963      9.501701      9.145406  0.770415
        state-cat32_ms                               8.852230      9.546560      9.114142  0.767782
        state-car_ms                                 8.744904      9.364068      9.000629  0.758219
        state-caq_fs                                 1.167949      1.264045      1.204005  0.101426
        state-can_ms                                 1.313321      1.425241      1.342329  0.113079
        state-cal13_ms                               8.926063      9.651054      9.359504  0.788451
        state-cal12_ms                               8.776668      9.629176      9.255009  0.779648
        state-bk_ms                                  6.559225      7.040436      6.863340  0.578172
        state-bk_fs                                  1.253371      1.371389      1.297992  0.109344
        state-cadyn_fs                               1.244077      1.360497      1.289854  0.108658
        state-cadyn_ms                               7.993786      8.283918      8.156555  0.687114
        state-caldyn_ms                              7.974183      8.223518      8.089551  0.681469
        state-pas                                    0.242036      0.264529      0.250038  0.021063
      update                                         5.672399      5.888646      5.769049  0.485989
      second_order_cur                               0.215864      0.239019      0.224698  0.018929
      matrix-solver                                 46.665604     48.344990     47.195290  3.975763
      setup_tree_matrix                            284.373747    292.540055    288.379657 24.293298
        cur-tmGlut                                  47.796660     50.439045     49.717282  4.188218
        cur-tmGabaA                                 27.026846     27.746684     27.412058  2.309210
        cur-sk_ms                                   11.592638     11.858541     11.693581  0.985075
        cur-gGapPar                                  1.595394      1.747551      1.645170  0.138590
        cur-naf_ms                                  12.283355     13.308238     12.901631  1.086842
        cur-naf_fs                                   1.289727      1.408634      1.323691  0.111509
        cur-kir_ms                                  12.141881     12.471281     12.275892  1.034129
        cur-kir_fs                                   2.031832      2.172042      2.070867  0.174451
        cur-kdr_ms                                  10.819164     11.101391     10.921307  0.920018
        cur-kdr_fs                                   1.269751      1.400391      1.318768  0.111094
        cur-kas_ms                                  12.963466     13.412060     13.127236  1.105847
        cur-kas_fs                                   1.303103      1.438943      1.344530  0.113264
        cur-kaf_ms                                  12.907275     13.207011     13.036136  1.098173
        cur-kaf_fs                                   1.300288      1.434272      1.351428  0.113845
        cur-cat33_ms                                12.352017     12.797276     12.598672  1.061321
        cur-cat32_ms                                12.374102     12.901329     12.588384  1.060454
        cur-car_ms                                  13.336423     14.319290     13.998113  1.179211
        cur-caq_fs                                   1.357181      1.463727      1.390305  0.117120
        cur-can_ms                                   1.421563      1.556975      1.471210  0.123936
        cur-cal13_ms                                14.101366     14.347143     14.205350  1.196668
        cur-cal12_ms                                14.098088     14.430896     14.266691  1.201836
        cur-bk_ms                                   11.719685     12.056659     11.847698  0.998058
        cur-bk_fs                                    1.305406      1.419977      1.358138  0.114410
        cur-cadyn_fs                                 1.113127      1.209888      1.149567  0.096840
        cur-cadyn_ms                                 1.584393      1.720349      1.639903  0.138147
        cur-caldyn_ms                                1.608654      1.721550      1.659429  0.139791
        cur-cal_ion                                  3.873672      4.067739      3.970980  0.334518
        cur-ca_ion                                   3.870369      4.084463      3.968295  0.334292
        cur-k_ion                                    2.011889      2.141438      2.067910  0.174202
        cur-na_ion                                   1.967243      2.098818      2.027355  0.170786
        cur-pas                                      6.587077      6.816442      6.676041  0.562394
      deliver_events                               278.642203    297.546088    285.308842 24.034611
        nrn_deliver_events                         121.953930    130.628961    124.658322 10.501302
          netbuf_receive_device                      3.493318      3.726540      3.576826  0.301314
          transfer_netbuf_host2device               37.214977     39.304264     37.771478  3.181895
            update_net_receive_buffer               36.434771     38.457027     36.964274  3.113896
              acc_update_device                      9.505350     10.022741      9.699914  0.817127
              net_receive_buffer_order_refactor     25.575383     27.018933     25.938048  2.185039
          cvode_instance_deliver_events             79.686780     85.826019     81.679789  6.880761
        deliver_net_events                         155.102212    165.233591    159.007060 13.394863
          netbuf_receive_device                      3.403401      3.615408      3.513958  0.296018
          transfer_netbuf_host2device               36.874451     39.099152     37.558387  3.163944
            update_net_receive_buffer               36.089584     38.245614     36.745719  3.095484
              acc_update_device                      9.482481     10.057157      9.717243  0.818587
              net_receive_buffer_order_refactor     25.269030     26.754816     25.696415  2.164683
          deque_deliver_host                        78.568418     84.096959     80.689731  6.797358
          nrn_multisend_advance_host                 0.424282      0.446943      0.431616  0.036360
          get_watch_host                             0.212828      0.230211      0.218959  0.018445
          send_presyn_host                          28.367523     30.502133     29.631212  2.496153
          transfer_spike_deveice2host                1.087129      1.202950      1.137972  0.095864
          collect_spike_device                       2.784141      3.025850      2.944873  0.248078
  finitialize                                        6.687069      6.687169      6.687120  0.563328
    spike-exchange                                   0.000130      0.406356      0.203937  0.017180
      communication                                  0.000043      0.000068      0.000054  0.000005
      imbalance                                      0.000058      0.406271      0.203865  0.017174
    cur-tmGlut                                       0.001343      0.001421      0.001399  0.000118
    cur-tmGabaA                                      0.000745      0.000759      0.000752  0.000063
    cur-sk_ms                                        0.000296      0.000299      0.000297  0.000025
    cur-gGapPar                                      0.000041      0.000044      0.000042  0.000004
    cur-naf_ms                                       0.000308      0.000331      0.000326  0.000027
    cur-naf_fs                                       0.000034      0.000036      0.000035  0.000003
    cur-kir_ms                                       0.000310      0.000318      0.000313  0.000026
    cur-kir_fs                                       0.000051      0.000055      0.000053  0.000004
    cur-kdr_ms                                       0.000278      0.000282      0.000279  0.000024
    cur-kdr_fs                                       0.000032      0.000037      0.000035  0.000003
    cur-kas_ms                                       0.000331      0.000340      0.000335  0.000028
    cur-kas_fs                                       0.000034      0.000036      0.000035  0.000003
    cur-kaf_ms                                       0.000329      0.000337      0.000333  0.000028
    cur-kaf_fs                                       0.000034      0.000036      0.000035  0.000003
    cur-cat33_ms                                     0.000319      0.000330      0.000324  0.000027
    cur-cat32_ms                                     0.000319      0.000329      0.000324  0.000027
    cur-car_ms                                       0.000341      0.000368      0.000361  0.000030
    cur-caq_fs                                       0.000036      0.000039      0.000037  0.000003
    cur-can_ms                                       0.000037      0.000039      0.000038  0.000003
    cur-cal13_ms                                     0.000364      0.000370      0.000366  0.000031
    cur-cal12_ms                                     0.000365      0.000373      0.000368  0.000031
    cur-bk_ms                                        0.000298      0.000301      0.000299  0.000025
    cur-bk_fs                                        0.000034      0.000036      0.000035  0.000003
    cur-cadyn_fs                                     0.000027      0.000031      0.000029  0.000002
    cur-cadyn_ms                                     0.000042      0.000044      0.000044  0.000004
    cur-caldyn_ms                                    0.000043      0.000047      0.000045  0.000004
    cur-cal_ion                                      0.000098      0.000103      0.000101  0.000008
    cur-ca_ion                                       0.000097      0.000102      0.000098  0.000008
    cur-k_ion                                        0.000049      0.000051      0.000050  0.000004
    cur-na_ion                                       0.000051      0.000054      0.000052  0.000004
    cur-pas                                          0.000175      0.000179      0.000176  0.000015
    nrn_deliver_events                               0.000562      0.000625      0.000597  0.000050
      netbuf_receive_device                          0.000283      0.000323      0.000305  0.000026
      transfer_netbuf_host2device                    0.000090      0.000093      0.000092  0.000008
        update_net_receive_buffer                    0.000033      0.000036      0.000035  0.000003
      cvode_instance_deliver_events                  0.000036      0.000043      0.000039  0.000003
  load-model                                        58.910579     60.469316     59.792065  5.036924

Note that "transfer_netbuf_host2device" refers to the data transfer of update_net_receive_buffer(nt). "deque_deliver_host", "cvode_instance_deliver_events" and "nrn_spike_exchange_send" refer to the deque and netcon send( aka *corenrn.get_pnt_receive()[typ]) procedure). Spike_exchange_send and deliver_events consumes 43% of time in total.

HolyLow avatar Jan 22 '21 10:01 HolyLow

@pramodk @nrnhines So could you please show me some hints on the following questions?

  1. If I want to let the netcons that is going to fire in a timestep fire (aka calling the *corenrn.get_pnt_receive()[typ]) in parallel on GPU, and eliminate the data transfer of update_net_receive_buffer(nt), what should I do? The mod files are auto-generated, and do I have to modify the generated mod c codes on my self? Could you give me some advices on how to realize it?
  2. It appears that there are two functions, deliver_net_events and nrn_deliver_events, that are related to the event delivery. But when I dig into the code, only the deliver_net_events function would call (input)presyn->send and insert the netcons into the priority queue. So what does nrn_deliver_events do? Why is it seperated from the deliver_net_events?
  3. In update_net_receive_buffer, what does the net_receive_buffer_order_refactor do? The annotation says "instance order to avoid race", what does that mean?

HolyLow avatar Jan 28 '21 09:01 HolyLow

I plan to work on this issue tomorrow. I will take your provided dataset, run on our machine and respond to above questions. Sorry for delays.

pramodk avatar Jan 28 '21 10:01 pramodk

  1. Note that deliver_net_events(nth) is called on entry to nrn_fixed_step_thread(NrnThread* nth) to check thresholds and deliver all (including binqueue) events up to tentry+dt/2. Whereas nrn_deliver_events(nth) is called on exit to deliver all except binqueue events up to but not past texit (tentry + dt). The issue that is trying to be resolved are the cases of events generated in a time step that need to be delivered during that time step. These are generally SelfEvents from NET_RECEIVE net_send(...) calls (often 0 delay) and not NetCon events which can only be 0 delay if src and target are in same thread.
  2. It is going to take some study on my part to comment that code adequately. My working hypothesis (because of the priority queue in net_receive_buffer_order) is that without it there were assertion errors in the NET_RECEIVE block due to events arriving with a delivery time earlier than the previous event. As @pramodk was also involved with this code, he may also be able to enter the discussion.

nrnhines avatar Jan 29 '21 22:01 nrnhines

@nrnhines

These are generally SelfEvents from NET_RECEIVE net_send(...) calls (often 0 delay) and not NetCon events which can only be 0 delay if src and target are in same thread.

So could I safely assume that no netcons will be delivered in nrn_deliver_events(nth), and all the netcons are delivered in deliver_net_events(nth)?

Besides, could you give me some hints on question 1? Could I achieve that without rewriting too many codes?

HolyLow avatar Feb 04 '21 07:02 HolyLow

@HolyLow : I forgot to update this ticket last week:

@nrnhines and myself went through this issue last week and looked at profile numbers you posted.

If I am not mistaken, you have done some additional instrumentation of functions. I can guess what are possible instrumentations you did but just to be sure I was wondering if could also past git diff (or link to the code / fork if you have its on GitHub).

We looked into update_net_receive_buffer and net_receive_buffer_order. There are some performance fixes we would like to experiment with. Instead of running full model (which is time consuming), Michael proposed to create a standalone test that will reproduce this performance issue and then it could be easily tested.

We will try this next week and will update this ticket.

pramodk avatar Feb 05 '21 08:02 pramodk