CoreNeuron
CoreNeuron copied to clipboard
MPI Simulation fails when the --multisend flag is enabled
Describe the issue I am trying to enable the multisend option of spike exchanging. However, when I tried to run the simulation as :
mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu --multisend
My program failed, while the command without "--multisend" could run smoothly. I am not sure if there is something wrong with my environment or some option that I failed to enable, or the code had some bug.
To Reproduce, and the corresponding Logs Steps to reproduce the behavior:
- If I don't enable the "--multisend" option, the program runs smoothly, and the "normal" log is as below:
$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[36744,1],7]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
[dgxone:76177] [[36744,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file ../../orte/util/show_help.c at line 513
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 216.2031, Min 215.6484, Avg 215.9121
Memory (MBs) : After MPI_Init : Max 216.2031, Min 215.6484, Avg 215.9395
Memory (MBs) : Before nrn_setup : Max 217.0234, Min 216.5938, Avg 216.8501
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[dgxone:76177] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:76177] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:76177] 6 more processes have sent help message help-btl-vader.txt / cma-permission-denied
Setup Done : 8.70 seconds
Memory (MBs) : After nrn_setup : Max 1550.6797, Min 1519.4180, Avg 1538.5410
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000
GPU
--nwarp=0
--cell-permute=0
INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/10000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false
SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 1550.6797, Min 1519.4180, Avg 1538.5410
Memory (MBs) : After nrn_finitialize : Max 1551.0352, Min 1519.8086, Avg 1538.8823
psolve |========================================================| t: 1000.00 ETA: 0h02m44s
Solver Time : 164.163
Simulation Statistics
Number of cells: 10000
Number of compartments: 2494952
Number of presyns: 4689782
Number of input presyns: 69968
Number of synapses: 11968052
Number of point processes: 16710070
Number of transfer (gap) targets: 0
Number of spikes: 220340
Number of spikes with non negative gid-s: 220340
However, if the "--multisend" flag is appended, the program fails quickly with the log below:
$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[35160,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 216.1602, Min 215.7266, Avg 215.9482
Memory (MBs) : After MPI_Init : Max 216.1602, Min 215.7266, Avg 215.9771
Memory (MBs) : Before nrn_setup : Max 217.0703, Min 216.7070, Avg 216.8882
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=17523 total=371.438 time=0.000624772
all2allv_int gidout space=2532 total=371.438 time=0.000354822
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[35160,1],2]
Exit code: 1
--------------------------------------------------------------------------
[dgxone:12096] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:12096] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:12096] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
Expected behavior As the non-multisend command runs well, I think there is nothing wrong with my environment. However, when the multisend flag is appended, the program fails, which is hard to understand or debug.
System (please complete the following information)
- OS: Ubuntu 20.04
- Compiler: PGI 20.7
- Version: master branch
- Backend: CPU, GPU
Does the simulation work with --multisend but without --gpu?
@nrnhines No, the simulation fails with --multisend but without --gpu. The log is quiet similar:
$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/10000Sim/RoundRobin-core-8 --mpi --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[38544,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
num_mpi=8
num_omp_thread=1
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 216.0508, Min 215.5312, Avg 215.7192
Memory (MBs) : After MPI_Init : Max 216.0625, Min 215.5312, Avg 215.8867
Memory (MBs) : Before nrn_setup : Max 217.3984, Min 216.7070, Avg 217.1641
all2allv_int gidin to intermediate space=17523 total=371.051 time=0.000610346
all2allv_int gidout space=2532 total=371.402 time=0.00050868
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[38544,1],6]
Exit code: 1
--------------------------------------------------------------------------
[dgxone:12424] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:12424] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:12424] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
@HolyLow : is your model or smaller test example available somewhere that we can use to reproduce the issue ?
@pramodk All the networks I use are generated by Snudda and exported with the Neuron's nrnbbcore_write API. To meet your advice, I tested with a smaller network generated by Snudda called tinySim which consisted of 100 neurons. When I tested the same network with -np=8, the program failed:
$ mpiexec -np 8 ./x86_64/special-core --tstop 1000 --datpath ./networks/tinySim/RoundRobin-core-8 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[33977,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 216.1992, Min 215.5781, Avg 215.8984
Memory (MBs) : After MPI_Init : Max 216.2734, Min 215.5781, Avg 215.9272
Memory (MBs) : Before nrn_setup : Max 217.0977, Min 216.5352, Avg 216.8506
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=192 total=218.223 time=0.000168855
all2allv_int gidout space=58 total=218.223 time=7.7529e-05
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[33977,1],0]
Exit code: 1
--------------------------------------------------------------------------
[dgxone:08865] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:08865] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:08865] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
However, when I tested the small network with -np=2, the program succeeded:
$ mpiexec -np 2 ./x86_64/special-core --tstop 1000 --datpath ./networks/tinySim/Rou
ndRobin-core-2 --mpi --gpu --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[53236,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
num_mpi=2
num_omp_thread=1
Info : 8 GPUs shared by 2 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 215.1992, Min 215.0039, Avg 215.1016
Memory (MBs) : After MPI_Init : Max 215.1992, Min 215.0039, Avg 215.1016
Memory (MBs) : Before nrn_setup : Max 216.1953, Min 216.0273, Avg 216.1113
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
all2allv_int gidin to intermediate space=108 total=220.668 time=1.94621e-05
all2allv_int gidout space=108 total=220.668 time=1.5568e-05
all2allv_int lists space=408 total=220.668 time=1.94809e-05
Setup Done : 0.31 seconds
Memory (MBs) : After nrn_setup : Max 266.2656, Min 262.9492, Avg 264.6074
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000
GPU
--nwarp=0
--cell-permute=0
INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/tinySim/RoundRobin-core-2
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false
SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.01375
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 266.2656, Min 262.9492, Avg 264.6074
Memory (MBs) : After nrn_finitialize : Max 266.5938, Min 263.4727, Avg 265.0332
[dgxone:27116] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics34s
[dgxone:27116] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:27116] 1 more process has sent help message help-btl-vader.txt / cma-permission-denied
psolve |========================================================| t: 1000.00 ETA: 0h09m55s
Solver Time : 595.167
Simulation Statistics
Number of cells: 100
Number of compartments: 25006
Number of presyns: 47647
Number of input presyns: 100
Number of synapses: 55873
Number of point processes: 103510
Number of transfer (gap) targets: 0
Number of spikes: 4162
Number of spikes with non negative gid-s: 4162
I did more experiments, and it turned out that regardless of the network size, np=2 or np=4 would work fine, but np=8 would fail. With the logs above, I found that the problem might lay in the all2allv_int initialization "all2allv_int lists space=408 total=220.668 time=1.94809e-05“. But I am not able to find out anything more, as I am not familiar with the code.
Ok thanks!
i am not familiar with Snudda, will take a look. But if you have commands or scripts already to export such model, you can provide here. (Just would be bit easy to reproduce)
@pramodk The original Snudda doesn't have the functionality of exporting network to CoreNeuron, and my modified version is too dirty to share. As a workaround, I uploaded my generated tinySim data so you could download it. The additional mod files are required to run the generated tinySim, so don't forget to compile the additional mods into CoreNeuron. @pramodk Thanks a lot!! Looking forward to your further advice.
if the "--multisend" flag is appended, the program fails quickly
Just a datapoint as my desktop takes 20s to simulate 20ms on 8 cores when configured with -DCMAKE_BUILD_TYPE=Debug. Without a --gpu, I experience
hines@hines-T7500:~/Downloads/tinySim$ mpiexec -np 8 ./x86_64/special-core --tstop 20 --datpath ./original-RoundRobin-core-8 --mpi --multisend`
runs successfully, generating 4 spikes
hines@hines-T7500:~/Downloads/tinySim$ cat out.dat
2.8 52
5.6 27
7.425 66
13.675 47
And I see the output
...
all2allv_int gidin to intermediate space=192 total=15.4336 time=0.000116663
all2allv_int gidout space=58 total=15.4336 time=0.000155916
all2allv_int lists space=394 total=15.4336 time=4.424e-05
Setup Done : 0.21 seconds
...
If you are experiencing an error under these conditions, then it seems that we will need to reproduce more accurately your hardware/software environment.
@nrnhines I tried your datapoint, and without --multisend it worked fine on my environment as well. The outputs are also identical to what you've shown. But if I add the --multisend flag, the program fails. Logs are as below:
$ mpiexec -np 8 ./x86_64/special-core --tstop 20 --datpath ./networks/tinySim/RoundRobin-core-8 --mpi --multisend
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
--------------------------------------------------------------------------
[[16699,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: dgxone
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: dgxone
--------------------------------------------------------------------------
num_mpi=8
num_omp_thread=1
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 c40b39f (2020-12-30 09:29:10 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 215.6602, Min 215.3125, Avg 215.5264
Memory (MBs) : After MPI_Init : Max 215.9609, Min 215.3125, Avg 215.6260
Memory (MBs) : Before nrn_setup : Max 216.6797, Min 216.3984, Avg 216.5669
all2allv_int gidin to intermediate space=192 total=217.977 time=0.000178425
all2allv_int gidout space=58 total=217.977 time=7.27859e-05
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[16699,1],5]
Exit code: 1
--------------------------------------------------------------------------
[dgxone:59171] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[dgxone:59171] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[dgxone:59171] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
@nrnhines @pramodk Hello, I tried to dig into the source code, and the error seemed to come from the use_phase2_ procedure in nrnmultisend_setup.cpp. But I could hardly understand what the use_phase2_ was doing... When I disabled the use_phase2_ with "--ms-phases 1" option in the cmd, things worked fine again. I was wondering if the use_phase2_ related code was buggy.
@HolyLow : thanks for an update.
Just FYI : I haven't used multi-send option often. This helps for spike exchange in specific scenario. Are you testing this option for any specific use case?
I didn't get time to look into details yet. Will try to debug with your provided dataset during this week.
@pramodk Thanks for your attention. I was just trying to compare the behavior and performance when --multisend option was (or not) enabled. And I ran into the crash accidentally.
When digging into the code, I find another interesting performance issue, maybe not related to this issue. When I dig into deliver_net_events related node, I notice that the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) seems to be executed on CPU side, which causes update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) to transfer data from CPU to GPU. Is there any chance to put the procedure on GPU side to eliminate the redundant data transfer? It seems that mod2c generates openacc code for _net_buffer_receive function, but _net_receive function is never parallelized by openacc.( Actually I am not sure about this, because in _net_receive, the realloc_net_receive_buffer function seems to cooperate with openacc. But if this is true, why should the data be transferred again later in NetCvode::deliver_net_events?) I want to know how to put the _net_receive procedure on GPU side and eliminate the update_net_receive_buffer(nt) in NetCvode::deliver_net_events if possible.
Perhaps this could be improved. I don't know. The multisend method is an interprocessor spike exchange method, which, under some circumstances, has better performance than MPI_Allgather (followed by, if there are more spikes than can fit into the allgather source buffer, an MPI_Allgatherv). These spike exchange methods are not directly involved in delivery. When spikes arrive at the cpu, those spikes are placed in an event queue to await delivery. Interprocessor spike exchange and enqueuing/dequeuing are done on the cpu. At the time of dequeuing, the spike is place in a buffer specific to a mod file and at each time step that buffer is copied to the gpu. The gpu then calls the proper NET_RECEIVE block instance for the spike.
I didn't know if it was possible to efficiently manage a priority queue on the GPU.
@nrnhines Thanks for your explanation!
At the time of dequeuing, the spike is place in a buffer specific to a mod file and at each time step that buffer is copied to the gpu.
Do you mean that the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) only places the spike in a buffer specific to a mod file, and if we directly change the buffer on the gpu, the update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) which transfers data from CPU to GPU can be eliminated?
I didn't know if it was possible to efficiently manage a priority queue on the GPU.
I think if the deliver procedure is to be moved to the GPU side, then maybe we won't choose to use a priority queue. Possibly we can record all the spikes according to their arrival times, and check if the fired spikes can trigger some netcon at each dt.
Do you mean
Yes. I would only disambiguate, or stress a bit more, that on dequeuing on the cpu from the destination cpu priority queue that the cpu copies the spike to a modfile specific buffer on the cpu. That buffer is copied every time step to the gpu. The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step. The presumption was that cpu->gpu transfer is better performance as an array than as one at a time. If that presumption is vitiated by the performance improvement of eliminating the
update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) which transfers data from CPU to GPU can be eliminated
Then it makes sense to do so.
Possibly we can record all the spikes according to their arrival times
I'm not sure I know exactly what you mean. There is a spike generation time ts, an arrival time on the destination node, and a delivery time, ts + NetCon.delay. The latter is what goes onto the priority queue and when that time occurs the spike is sent to the (buffered) destination for immediate (that time step) delivery.
check if the fired spikes can trigger some netcon at each dt.
That seems to imply some sort of queue or insertable, removable list on the gpu which is either global, per mod type, or per mod instance. generation order, is not correlated with arrival on node order, which is not correlated with delivery order. (latter due to different NetCon.delay) (although in practice delivery is in the same order as generation due to globally constant NetCon.delay).
@nrnhines I really appreciate your patience!! I have some more questions:
the cpu copies the spike to a modfile specific buffer on the cpu.
Do you refer to the function (*corenrn.get_pnt_receive()[typ])(target_, u.weight_index_, 0) (called in NetCon::deliver) ? I failed to find out the data structure of the "modfile specific buffer".
That buffer is copied every time step to the gpu.
Is the copy carried out in update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) ? As I am not able to find the modfile specific buffer on CPU, I failed to understand this procedure as well...
The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step.
Where does this procedure happen? In update_net_receive_buffer(nt) (called in NetCvode::deliver_net_events) ?
I'm not sure I know exactly what you mean. There is a spike generation time ts, an arrival time on the destination node, and a delivery time, ts + NetCon.delay.
I mean the spike generation time ts, which is transferred in MPI_Spike structure. I think maybe we could record all the spikes in a table according to their generation time ts, and then we check the table each time we want to "pop" netcons. We could directly check all the spike's netcons, calculate spike.ts + netcon.delay to see if the netcon should be activated. As the netcons have a maximum delay, the spikes would be outdated after the maximum delay and can be safely discarded, which keeps the number of tracked spikes small enough. A round-robin queue might be just ok.
Actually, I have another question for the existing priority queue. It seems that all kinds of DiscreteEvents are inserted into the queue, such as NetCons, SelfEvents, ConditionEvents, NetParEvents, etc. Currently I only know that in deliver_net_events related node, the netcons would be inserted into the queue. But what about other kinds of events, including SelfEvents, ConditionEvents and NetParEvents? Where are they produced and inserted into the priority queue?
the cpu copies the spike to a modfile specific buffer on the cpu. That buffer is copied every time step to the gpu. The gpu delivers the contents of the buffer to the NET_RECEIVE block instances every time step.
void NetCvode::deliver_net_events(NrnThread* nt) { // for default method
...
deliver_events(tm, nt); Though need to deal with interthread events, the principle call here is for each event to call
deliver_event which calls (for our purposes) NetCon.deliver.
That calls (*corenrn.get_pnt_receive()[typ]) which is in the the nmodl translated cxx file with the function
name (e.g. in build/x86_64/corenrn/mod2c/expsyn.cpp) _net_buf_receive which appends the
spike to the membrane_list specific _net_receive_buffer (on the cpu)
...
/*before executing on gpu, we have to update the NetReceiveBuffer_t on GPU */
update_net_receive_buffer(nt);
for (auto& net_buf_receive : corenrn.get_net_buf_receive()) {
(*net_buf_receive.first)(nt); This calls the _net_buf_receive function in the proper cxx translated mod file where,
on the GPU, the events in the _ml->_net_receive_buffer are looped over and
net_receive_kernel is called for each of then.
I think maybe we could record all the spikes in a table according to their generation time ts, and then we check the table each time we want to "pop" netcons.
That is very similar to the binqueue for NetCon events except td is in the binqueue so that spike.ts+netcon.delay is calculated only once.
what about other kinds of events
They are never interprocessor or interthread events. They come from a mod file instance and get sent back to the same instance. See netcvode.cpp:: void net_send(...
@nrnhines
That is very similar to the binqueue for NetCon events except td is in the binqueue so that spike.ts+netcon.delay is calculated only once.
Wow, I haven't read the code about binqueue, and I think this query method would be good for GPU parallelization rather than CPU single thread. So if the query method is moved to GPU, the performance should increase greatly.
They are never interprocessor or interthread events. They come from a mod file instance and get sent back to the same instance. See netcvode.cpp:: void net_send(...
Do you mean that in the priority queue there are only netcons, and other events such as SelfEvents, ConditionEvents, and NetParEvents won't appear in the queue?
Besides, if I want to let the translated cxx file's _net_buf_receive directly modify the gpu version _net_receive_buffer to avoid the update_net_receive_buffer(nt) (which is strangely slow in my profiling...), what should I do? Currently, the _net_receive_buffer is pinned between CPU and GPU because of acc_copyin, and I guess if the size of the buffer components changes, during the acc_update_device the gpu buffer might be freed, reallocated and transferred again, which greatly hurts performance.
...binqueue... if the query method is moved to GPU, the performance should increase greatly.
I agree it could be on the GPU. Whether you can get better performance (at least for interprocessor spikes) is an experimental question. But I wouldn't be surprised if a single cpu thread model on the GPU could greatly benefit with a spike staying on the GPU from send to receive. That was beyond the scope of my GPU understanding since spikes are generated and delivered randomly without regard to ordering related to the otherwise "structure of array" memory organization.
SelfEvents, ConditionEvents, and NetParEvents won't appear in the queue?
The don't appear in the binqueue. They go into the heap or splay tree queue for exact delivery time. These really are candidates for staying on the GPU (at least SelfEvents and ConditionEvents that get sent and delivered to the same object. NetParEvents are just for synchronization at "minimum NetCon (interprocessor/interthread) delay integration intervals" to ensure every event has arrived at its thread destination in time for delivery. The NEURON version has a flag for "SelfEvent not on queue" that takes advantage of this but it did not make it into CoreNEURON.
(which is strangely slow in my profiling...), what should I do? ... guess if the size of the buffer components changes
You would have to establish whether buffer size changes are a significant performance issue. Presently the ml->_net_receive_buffer
is doubled in size every reallocation and starts out with size equal to the number of instances of the type. The usual pattern is that the number of NetCon events destined for an instance within a time step is usually 0 and never > 1. Full spike synchrony would just be enough to fill it up. We should consult with @pramodk about your performance results.
@nrnhines I've done some more detailed profiling, and it suggests that you are right about the GPU buffer allocations. The GPU buffers are not reallocated, so the performance is not influenced by it.
The more detailed profiling suggests that the (*corenrn.get_pnt_receive()[typ]) called by the delivered netcons and the update_net_receive_buffer(nt) are consuming a great partial of time when the simulation size increases.
I think If we could collect all the netcons that is going to fire in a timestep, and let them fire (aka calling the *corenrn.get_pnt_receive()[typ]) in parallel on GPU, and eliminate the data transfer of update_net_receive_buffer(nt), then the performance should be improved. But I have no idea how to make it, as the mod files are auto-generated. Do I have to modify the generated mod c codes on my self? Could you give me some advices on how to realize it?
I have another question on the deliver event functions. It appears that there are two functions, deliver_net_events and nrn_deliver_events, that are related to the event delivery. But when I dig into the code, only the deliver_net_events function would call (input)presyn->send and insert the netcons into the priority queue. So what does nrn_deliver_events do? Why is it seperated from the deliver_net_events?
@nrnhines : could you copy / attach one of your profile output and also coreneuron's stdout? I am curious because in other models I haven't seen event related parts being bottleneck.
@pramodk I ran a huge network with 100000 neurons, on 8 top-tired A100 GPUs. Here is my output log:
$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu --multisend --ms-phases 1
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[20140,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: onebrain-dgx-a100-01
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 202.8984, Min 202.4102, Avg 202.5913
Memory (MBs) : After MPI_Init : Max 202.8984, Min 202.4102, Avg 202.5913
Memory (MBs) : Before nrn_setup : Max 204.4688, Min 204.0898, Avg 204.2632
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2475650] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2475650] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2475650] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
all2allv_int gidin to intermediate space=175014 total=1833.57 time=0.00667806
all2allv_int gidout space=25034 total=1836.1 time=0.000639373
all2allv_int lists space=225026 total=1836.9 time=0.00040563
Setup Done : 52.94 seconds
Memory (MBs) : After nrn_setup : Max 13454.4922, Min 13310.5234, Avg 13362.8823
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000
GPU
--nwarp=0
--cell-permute=0
INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false
SPIKE EXCHANGE
--ms_phases=1
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 13454.4922, Min 13310.5234, Avg 13362.8823
Memory (MBs) : After nrn_finitialize : Max 13454.8281, Min 13310.8945, Avg 13363.2485
psolve |========================================================| t: 1000.00 ETA: 0h14m08s
Solver Time : 848.615
Simulation Statistics
Number of cells: 100001
Number of compartments: 24948985
Number of presyns: 46885564
Number of input presyns: 699903
Number of synapses: 142227989
Number of point processes: 189819298
Number of transfer (gap) targets: 0
Number of spikes: 1705882
Number of spikes with non negative gid-s: 1705882
Path Min time/rank Max time/rank Avg time/rank Time %
main 910.918379 910.971648 910.943178 99.999924
checkpoint 0.000014 0.000017 0.000016 0.000002
output-spike 0.181329 0.181383 0.181364 0.019909
simulation 848.614906 848.614913 848.614910 93.157761
nrn_multisend_receive 48.211981 77.066952 65.366869 7.175730
nrnmpi_multisend_advance 9.009972 9.801592 9.344053 1.025755
timestep 771.164518 799.889700 782.842262 85.937486
state-update 21.478632 23.407606 22.235478 2.440927
state-tmGlut 0.600395 0.657703 0.621035 0.068175
state-tmGabaA 0.603162 0.670587 0.633518 0.069545
state-sk_ms 0.630907 0.675192 0.649857 0.071339
state-gGapPar 0.198618 0.209459 0.204231 0.022420
state-naf_ms 0.624028 0.681327 0.646556 0.070977
state-naf_fs 0.621885 0.693116 0.653311 0.071718
state-kir_ms 0.615571 0.683735 0.639699 0.070224
state-kir_fs 0.621746 0.702599 0.645035 0.070810
state-kdr_ms 0.605290 0.691195 0.639563 0.070209
state-kdr_fs 0.609506 0.685924 0.640477 0.070309
state-kas_ms 0.612295 0.683557 0.642188 0.070497
state-kas_fs 0.621509 0.713599 0.654592 0.071859
state-kaf_ms 0.617341 0.693093 0.644476 0.070748
state-kaf_fs 0.624762 0.703548 0.653654 0.071756
state-cat33_ms 0.622820 0.670690 0.647910 0.071125
state-cat32_ms 0.623263 0.677978 0.644940 0.070799
state-car_ms 0.620437 0.680921 0.649741 0.071326
state-caq_fs 0.624777 0.685835 0.651047 0.071470
state-can_ms 0.641906 0.691721 0.663833 0.072873
state-cal13_ms 0.630679 0.674601 0.650541 0.071414
state-cal12_ms 0.642529 0.690433 0.662547 0.072732
state-bk_ms 0.634718 0.687286 0.658555 0.072294
state-bk_fs 0.641710 0.709077 0.662374 0.072713
state-cadyn_fs 0.624075 0.685532 0.655934 0.072006
state-cadyn_ms 0.635891 0.710540 0.666689 0.073187
state-caldyn_ms 0.670242 0.715727 0.693189 0.076096
state-pas 0.230345 0.246662 0.236413 0.025953
update 0.761722 0.831632 0.800757 0.087904
second_order_cur 0.206082 0.237223 0.217908 0.023921
matrix-solver 260.285747 263.816372 262.734427 28.842000
setup_tree_matrix 27.783332 31.105136 29.111301 3.195729
cur-tmGlut 0.698753 0.778214 0.734702 0.080653
cur-tmGabaA 0.701016 0.771259 0.731793 0.080334
cur-sk_ms 0.640352 0.731186 0.676640 0.074279
cur-gGapPar 0.712011 0.801755 0.743627 0.081633
cur-naf_ms 0.642256 0.736338 0.681916 0.074858
cur-naf_fs 0.651287 0.712772 0.683288 0.075009
cur-kir_ms 0.647686 0.754648 0.679597 0.074604
cur-kir_fs 0.665318 0.731789 0.693071 0.076083
cur-kdr_ms 0.636183 0.731089 0.673716 0.073958
cur-kdr_fs 0.642821 0.718732 0.678598 0.074494
cur-kas_ms 0.669113 0.766549 0.703603 0.077239
cur-kas_fs 0.654087 0.745909 0.696810 0.076493
cur-kaf_ms 0.653428 0.722170 0.686797 0.075394
cur-kaf_fs 0.670647 0.759065 0.707676 0.077686
cur-cat33_ms 0.660693 0.763976 0.701601 0.077019
cur-cat32_ms 0.668792 0.768218 0.705791 0.077479
cur-car_ms 0.665881 0.754181 0.700573 0.076906
cur-caq_fs 0.670234 0.766199 0.715233 0.078516
cur-can_ms 0.672516 0.767212 0.710777 0.078026
cur-cal13_ms 0.665043 0.752573 0.706560 0.077564
cur-cal12_ms 0.671335 0.782261 0.713954 0.078375
cur-bk_ms 0.653320 0.735494 0.688944 0.075630
cur-bk_fs 0.658014 0.748753 0.699011 0.076735
cur-cadyn_fs 0.670311 0.765523 0.717275 0.078740
cur-cadyn_ms 0.653098 0.745702 0.694606 0.076251
cur-caldyn_ms 0.658266 0.750415 0.696119 0.076417
cur-cal_ion 0.556998 0.647780 0.587886 0.064536
cur-ca_ion 0.563171 0.632140 0.595445 0.065366
cur-k_ion 0.574322 0.640588 0.602244 0.066112
cur-na_ion 0.648642 0.746462 0.688948 0.075630
cur-pas 0.678330 0.787698 0.719481 0.078982
deliver_events 450.160572 473.903018 460.721194 50.576244
nrn_deliver_events 132.393860 135.454351 134.115292 14.722674
netbuf_receive_device 2.765391 2.927384 2.830566 0.310729
transfer_netbuf_host2device 50.994765 58.777755 55.119894 6.050855
update_net_receive_buffer 50.215810 58.042484 54.365246 5.968013
acc_update_device 24.804827 33.915664 29.601895 3.249585
net_receive_buffer_order_refactor 22.872733 24.352687 23.540262 2.584162
cvode_instance_deliver_events 72.225182 77.753242 74.663350 8.196262
deliver_net_events 313.929017 337.217062 325.070261 35.684993
netbuf_receive_device 2.750386 2.912784 2.820341 0.309607
transfer_netbuf_host2device 34.304229 36.817370 35.225450 3.866918
update_net_receive_buffer 33.573392 35.991656 34.462581 3.783173
acc_update_device 9.373999 9.934178 9.533084 1.046506
net_receive_buffer_order_refactor 23.000350 24.758430 23.701044 2.601812
deque_deliver_host 72.178038 77.727124 74.609715 8.190375
nrn_multisend_advance_host 167.421934 179.408870 173.218302 19.015255
nrnmpi_multisend_advance 0.931094 1.120856 1.029664 0.113033
get_watch_host 0.198831 0.225281 0.209353 0.022982
send_presyn_host 31.480995 35.580274 33.327281 3.658544
nrnmpi_multisend 3.652442 4.098563 3.879833 0.425914
transfer_spike_deveice2host 0.948560 1.023150 0.972604 0.106769
collect_spike_device 1.901445 2.134007 2.011170 0.220779
finitialize 6.514536 6.514597 6.514570 0.715145
nrn_multisend_receive 0.000184 0.437879 0.237697 0.026093
nrnmpi_multisend_advance 0.000047 0.000101 0.000067 0.000007
cur-tmGlut 0.000021 0.000028 0.000023 0.000003
cur-tmGabaA 0.000020 0.000035 0.000024 0.000003
cur-sk_ms 0.000018 0.000025 0.000021 0.000002
cur-gGapPar 0.000019 0.000035 0.000023 0.000002
cur-naf_ms 0.000019 0.000024 0.000021 0.000002
cur-naf_fs 0.000019 0.000025 0.000021 0.000002
cur-kir_ms 0.000019 0.000025 0.000021 0.000002
cur-kir_fs 0.000019 0.000026 0.000022 0.000002
cur-kdr_ms 0.000017 0.000025 0.000021 0.000002
cur-kdr_fs 0.000018 0.000025 0.000021 0.000002
cur-kas_ms 0.000019 0.000027 0.000022 0.000002
cur-kas_fs 0.000019 0.000026 0.000022 0.000002
cur-kaf_ms 0.000019 0.000024 0.000021 0.000002
cur-kaf_fs 0.000020 0.000026 0.000022 0.000002
cur-cat33_ms 0.000019 0.000024 0.000021 0.000002
cur-cat32_ms 0.000020 0.000025 0.000022 0.000002
cur-car_ms 0.000019 0.000025 0.000022 0.000002
cur-caq_fs 0.000020 0.000027 0.000023 0.000002
cur-can_ms 0.000019 0.000026 0.000022 0.000002
cur-cal13_ms 0.000019 0.000024 0.000021 0.000002
cur-cal12_ms 0.000020 0.000026 0.000022 0.000002
cur-bk_ms 0.000018 0.000024 0.000020 0.000002
cur-bk_fs 0.000020 0.000025 0.000021 0.000002
cur-cadyn_fs 0.000018 0.000029 0.000021 0.000002
cur-cadyn_ms 0.000017 0.000027 0.000020 0.000002
cur-caldyn_ms 0.000018 0.000030 0.000021 0.000002
cur-cal_ion 0.000013 0.000022 0.000015 0.000002
cur-ca_ion 0.000013 0.000019 0.000015 0.000002
cur-k_ion 0.000014 0.000018 0.000015 0.000002
cur-na_ion 0.000019 0.000024 0.000021 0.000002
cur-pas 0.000025 0.000036 0.000031 0.000003
nrn_deliver_events 0.006263 0.014525 0.013337 0.001464
netbuf_receive_device 0.005979 0.014208 0.013023 0.001430
transfer_netbuf_host2device 0.000091 0.000114 0.000099 0.000011
update_net_receive_buffer 0.000033 0.000044 0.000038 0.000004
cvode_instance_deliver_events 0.000036 0.000047 0.000041 0.000005
load-model 55.412495 55.606147 55.501959 6.092797
The profiler is Caliper, and note that I've added many detailed profiling to the deliver_events. In the above profiling, deliver_event consumes 50% of the time. "transfer_netbuf_host2device" refers to the data transfer of update_net_receive_buffer(nt). "deque_deliver_host" and"cvode_instance_deliver_events" refer to the deque and netcon send( aka *corenrn.get_pnt_receive()[typ]) procedure).
@HolyLow : if this is executing on GPUs, for profiling, have you done export PGI_ACC_SYNCHRONOUS=1
? If not, could you try and run again? (with and without multi-send).
The reason I am mentioning PGI_ACC_SYNCHRONOUS
is that kernels will be launched asynchronously on GPU and I want to verify the elapsed times are correct.
@pramodk I added the export PGI_ACC_SYNCHRONOUS=1 and reran the program, the log is as below:
$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu --multisend --ms-phases 1
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[3083,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: onebrain-dgx-a100-01
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 202.8594, Min 202.4688, Avg 202.6294
Memory (MBs) : After MPI_Init : Max 202.9570, Min 202.4688, Avg 202.6680
Memory (MBs) : Before nrn_setup : Max 204.6328, Min 204.1875, Avg 204.3760
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2655272] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2655272] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2655272] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
all2allv_int gidin to intermediate space=175014 total=1833.37 time=0.00298926
all2allv_int gidout space=25034 total=1836.03 time=0.00412042
all2allv_int lists space=225026 total=1836.77 time=0.000453119
Setup Done : 57.16 seconds
Memory (MBs) : After nrn_setup : Max 13454.6875, Min 13310.2070, Avg 13362.8667
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000
GPU
--nwarp=0
--cell-permute=0
INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false
SPIKE EXCHANGE
--ms_phases=1
--ms_subintervals=2
--multisend=true
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 13454.6875, Min 13310.2070, Avg 13362.8667
Memory (MBs) : After nrn_finitialize : Max 13455.0352, Min 13310.6016, Avg 13363.2354
psolve |========================================================| t: 1000.00 ETA: 0h17m08s
Solver Time : 1027.55
Simulation Statistics
Number of cells: 100001
Number of compartments: 24948985
Number of presyns: 46885564
Number of input presyns: 699903
Number of synapses: 142227989
Number of point processes: 189819298
Number of transfer (gap) targets: 0
Number of spikes: 1706101
Number of spikes with non negative gid-s: 1706101
Path Min time/rank Max time/rank Avg time/rank Time %
main 1094.953983 1095.032779 1095.000754 99.999943
checkpoint 0.000014 0.000017 0.000016 0.000001
output-spike 0.189638 0.189893 0.189854 0.017338
simulation 1027.548506 1027.548533 1027.548517 93.839929
nrn_multisend_receive 43.115893 84.618265 71.113138 6.494342
nrnmpi_multisend_advance 8.981463 9.662669 9.190748 0.839337
timestep 942.551377 984.021549 956.042807 87.309735
state-update 163.277901 169.919139 167.520333 15.298641
state-tmGlut 17.891402 20.962417 19.721948 1.801089
state-tmGabaA 18.806481 20.901478 19.983781 1.825000
state-sk_ms 5.966395 6.327325 6.155395 0.562136
state-gGapPar 0.199146 0.215570 0.205517 0.018769
state-naf_ms 7.553335 8.192543 7.905761 0.721986
state-naf_fs 1.203253 1.267370 1.229883 0.112318
state-kir_ms 5.832329 6.282179 6.091267 0.556279
state-kir_fs 1.507556 1.561963 1.530904 0.139808
state-kdr_ms 5.204399 5.630903 5.339970 0.487668
state-kdr_fs 1.136648 1.190377 1.158424 0.105792
state-kas_ms 7.536179 8.299321 8.022001 0.732602
state-kas_fs 1.222709 1.280508 1.251373 0.114281
state-kaf_ms 8.352539 8.974221 8.735237 0.797738
state-kaf_fs 1.187475 1.266548 1.223957 0.111777
state-cat33_ms 8.821202 9.457032 9.085241 0.829701
state-cat32_ms 8.739273 9.472221 9.003772 0.822261
state-car_ms 8.707610 9.329355 8.915207 0.814173
state-caq_fs 1.121425 1.217344 1.151527 0.105162
state-can_ms 1.254100 1.315166 1.280084 0.116902
state-cal13_ms 8.868501 9.517126 9.264925 0.846111
state-cal12_ms 8.727495 9.499393 9.158444 0.836386
state-bk_ms 6.540231 6.934413 6.782218 0.619380
state-bk_fs 1.211832 1.284328 1.240724 0.113308
state-cadyn_fs 1.191211 1.283060 1.225622 0.111929
state-cadyn_ms 7.900847 8.232364 8.043134 0.734532
state-caldyn_ms 7.898526 8.171343 7.996948 0.730314
state-pas 0.230333 0.237973 0.234125 0.021381
update 5.616285 5.869693 5.696664 0.520243
second_order_cur 0.202088 0.229119 0.213957 0.019539
matrix-solver 46.581775 48.050873 46.960193 4.288597
setup_tree_matrix 282.201459 291.735860 284.711288 26.000998
cur-tmGlut 47.820663 50.219490 49.467232 4.517550
cur-tmGabaA 26.867740 27.689633 27.253236 2.488877
cur-sk_ms 11.517667 11.878265 11.617802 1.060985
cur-gGapPar 1.500699 1.602815 1.552103 0.141744
cur-naf_ms 12.219218 13.171423 12.764611 1.165716
cur-naf_fs 1.223981 1.327254 1.265339 0.115556
cur-kir_ms 12.067042 12.362962 12.158961 1.110406
cur-kir_fs 1.941612 2.076152 1.995893 0.182273
cur-kdr_ms 10.773797 11.167160 10.854743 0.991299
cur-kdr_fs 1.216900 1.307927 1.250850 0.114233
cur-kas_ms 12.918723 13.412306 13.028015 1.189772
cur-kas_fs 1.236897 1.341445 1.274981 0.116436
cur-kaf_ms 12.817183 13.118215 12.940085 1.181741
cur-kaf_fs 1.244348 1.331345 1.281097 0.116995
cur-cat33_ms 12.337161 12.736887 12.490677 1.140700
cur-cat32_ms 12.362707 12.890519 12.497404 1.141314
cur-car_ms 13.216458 14.410276 13.887857 1.268296
cur-caq_fs 1.301755 1.397693 1.331208 0.121571
cur-can_ms 1.367863 1.483208 1.405410 0.128348
cur-cal13_ms 13.998148 14.284209 14.088838 1.286650
cur-cal12_ms 14.028517 14.450303 14.146985 1.291960
cur-bk_ms 11.640999 12.098735 11.755976 1.073604
cur-bk_fs 1.228959 1.315720 1.266282 0.115642
cur-cadyn_fs 1.058706 1.205225 1.100279 0.100482
cur-cadyn_ms 1.539039 1.666634 1.570395 0.143415
cur-caldyn_ms 1.555177 1.648164 1.586266 0.144864
cur-cal_ion 3.846431 4.095737 3.908022 0.356896
cur-ca_ion 3.827879 4.015865 3.889074 0.355166
cur-k_ion 1.965004 2.066879 1.993873 0.182089
cur-na_ion 1.900328 2.052253 1.946381 0.177752
cur-pas 6.512316 6.742299 6.570815 0.600074
deliver_events 437.413407 464.659278 447.529065 40.870183
nrn_deliver_events 113.639760 120.781768 116.346337 10.625223
netbuf_receive_device 3.304050 3.514258 3.367622 0.307545
transfer_netbuf_host2device 34.147801 36.281306 34.893598 3.186626
update_net_receive_buffer 33.417942 35.513446 34.147335 3.118474
acc_update_device 9.170103 9.595447 9.305124 0.849782
net_receive_buffer_order_refactor 23.036433 24.625675 23.633974 2.158351
cvode_instance_deliver_events 74.710494 79.397623 76.592057 6.994700
deliver_net_events 322.175748 342.262361 329.654370 30.105384
netbuf_receive_device 3.313029 3.599303 3.401578 0.310646
transfer_netbuf_host2device 34.341575 36.602362 35.124844 3.207744
update_net_receive_buffer 33.607144 35.808834 34.370025 3.138811
acc_update_device 9.299829 9.847114 9.466439 0.864514
net_receive_buffer_order_refactor 23.087757 24.635328 23.676065 2.162195
deque_deliver_host 74.477037 79.028204 76.292833 6.967373
nrn_multisend_advance_host 170.145610 180.988931 174.454839 15.931929
nrnmpi_multisend_advance 0.921635 1.179529 1.016580 0.092838
get_watch_host 0.198887 0.239930 0.206792 0.018885
send_presyn_host 32.147317 35.482287 33.679884 3.075785
nrnmpi_multisend 3.508115 3.931272 3.735521 0.341143
transfer_spike_deveice2host 1.053940 1.135249 1.074997 0.098173
collect_spike_device 2.621060 2.969821 2.750464 0.251184
finitialize 6.892855 6.892969 6.892919 0.629490
nrn_multisend_receive 0.000319 0.593271 0.447595 0.040876
nrnmpi_multisend_advance 0.000046 0.000139 0.000071 0.000007
cur-tmGlut 0.001347 0.001420 0.001402 0.000128
cur-tmGabaA 0.000745 0.000762 0.000753 0.000069
cur-sk_ms 0.000295 0.000306 0.000299 0.000027
cur-gGapPar 0.000041 0.000047 0.000042 0.000004
cur-naf_ms 0.000311 0.000336 0.000327 0.000030
cur-naf_fs 0.000034 0.000040 0.000036 0.000003
cur-kir_ms 0.000310 0.000319 0.000314 0.000029
cur-kir_fs 0.000052 0.000059 0.000054 0.000005
cur-kdr_ms 0.000278 0.000285 0.000280 0.000026
cur-kdr_fs 0.000033 0.000042 0.000036 0.000003
cur-kas_ms 0.000333 0.000345 0.000337 0.000031
cur-kas_fs 0.000035 0.000044 0.000037 0.000003
cur-kaf_ms 0.000331 0.000338 0.000334 0.000031
cur-kaf_fs 0.000034 0.000041 0.000037 0.000003
cur-cat33_ms 0.000318 0.000331 0.000325 0.000030
cur-cat32_ms 0.000319 0.000329 0.000325 0.000030
cur-car_ms 0.000342 0.000366 0.000361 0.000033
cur-caq_fs 0.000036 0.000041 0.000038 0.000003
cur-can_ms 0.000038 0.000044 0.000040 0.000004
cur-cal13_ms 0.000364 0.000373 0.000368 0.000034
cur-cal12_ms 0.000366 0.000373 0.000369 0.000034
cur-bk_ms 0.000298 0.000303 0.000300 0.000027
cur-bk_fs 0.000034 0.000040 0.000035 0.000003
cur-cadyn_fs 0.000028 0.000034 0.000030 0.000003
cur-cadyn_ms 0.000042 0.000048 0.000045 0.000004
cur-caldyn_ms 0.000043 0.000049 0.000045 0.000004
cur-cal_ion 0.000100 0.000119 0.000104 0.000009
cur-ca_ion 0.000098 0.000101 0.000099 0.000009
cur-k_ion 0.000049 0.000054 0.000051 0.000005
cur-na_ion 0.000051 0.000057 0.000053 0.000005
cur-pas 0.000174 0.000183 0.000179 0.000016
nrn_deliver_events 0.000565 0.000636 0.000597 0.000055
netbuf_receive_device 0.000283 0.000316 0.000298 0.000027
transfer_netbuf_host2device 0.000090 0.000102 0.000094 0.000009
update_net_receive_buffer 0.000034 0.000041 0.000036 0.000003
cvode_instance_deliver_events 0.000037 0.000043 0.000040 0.000004
load-model 60.102146 60.321097 60.198128 5.497539
In the above profiling, deliver_event consumes 40% of the time.
Thanks! Could you also attach profile without multi-send please?
@pramodk The log without multisend is as below:
$ mpiexec -np 8 ./profile_gpu_install/bin/special-core --tstop 1000 --datpath ./networks/100000Sim/RoundRobin-core-8 --mpi --gpu
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
--------------------------------------------------------------------------
[[21864,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: onebrain-dgx-a100-01
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel CMA support was requested via the
btl_vader_single_copy_mechanism MCA variable, but CMA support is
not available due to restrictive ptrace settings.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: onebrain-dgx-a100-01
--------------------------------------------------------------------------
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
libibverbs: Warning: couldn't open config directory '/usr/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs3
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs5
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs7
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs9
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs4
num_mpi=8
num_omp_thread=1
Info : 8 GPUs shared by 8 ranks per node
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
Version : 0.21.0 43fe5fa (2021-01-21 20:12:20 +0800)
Additional mechanisms from files
bk_fs.mod bk_ms.mod cadyn_fs.mod cadyn_ms.mod cal12_ms.mod cal13_ms.mod caldyn_ms.mod can_fs.mod can_ms.mod caq_fs.mod caq_ms.mod car_fs.mod car_ms.mod cat32_ms.mod cat33_ms.mod exp2syn.mod expsyn.mod h_lts.mod hh.mod im_lts.mod it_lts.mod kaf_fs.mod kaf_ms.mod kas_fs.mod kas_ms.mod kdr_fs.mod kdr_ms.mod kdrbca1_lts.mod kir_fs.mod kir_ms.mod na3n_lts.mod naf_fs.mod naf_lts.mod naf_ms.mod netstim.mod par_ggap.mod passive.mod pattern.mod sk_fs.mod sk_ms.mod stim.mod svclmp.mod tmampa.mod tmgabaa.mod tmglut.mod tmnmda.mod vecevent.mod
Memory (MBs) : After mk_mech : Max 202.8984, Min 202.6172, Avg 202.7500
Memory (MBs) : After MPI_Init : Max 202.9766, Min 202.6602, Avg 202.8291
Memory (MBs) : Before nrn_setup : Max 204.6641, Min 204.2695, Avg 204.4805
WARNING : GPU execution requires --cell-permute type 1 or 2. Setting it to 1.
[onebrain-dgx-a100-01:2743626] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[onebrain-dgx-a100-01:2743626] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[onebrain-dgx-a100-01:2743626] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
Setup Done : 57.05 seconds
Memory (MBs) : After nrn_setup : Max 13452.1523, Min 13308.1914, Avg 13360.6455
GENERAL PARAMETERS
--mpi=true
--gpu=true
--dt=0.025
--tstop=1000
GPU
--nwarp=0
--cell-permute=0
INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=./networks/100000Sim/RoundRobin-core-8
--filesdat=files.dat
--pattern=
--report-conf=
--restore=
PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false
SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false
CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=35
--mindelay=1.00875
--report-buffer-size=4
OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 13452.1523, Min 13308.1914, Avg 13360.6455
Memory (MBs) : After nrn_finitialize : Max 13452.5234, Min 13308.5508, Avg 13361.0044
psolve |========================================================| t: 1000.00 ETA: 0h18m40s
Solver Time : 1119.6
Simulation Statistics
Number of cells: 100001
Number of compartments: 24948985
Number of presyns: 46885564
Number of input presyns: 699903
Number of synapses: 142227989
Number of point processes: 189819298
Number of transfer (gap) targets: 0
Number of spikes: 1706357
Number of spikes with non negative gid-s: 1706357
Path Min time/rank Max time/rank Avg time/rank Time %
main 1187.057142 1187.094615 1187.074318 99.999948
checkpoint 0.000017 0.000022 0.000020 0.000002
output-spike 0.301216 0.301703 0.301620 0.025409
simulation 1119.597563 1119.597584 1119.597576 94.315661
nrn_spike_exchange_send 218.428292 237.298587 225.377526 18.985956
spike-exchange 59.924715 112.155343 93.236865 7.854337
communication 0.556400 0.707461 0.645768 0.054400
imbalance 59.274358 111.467468 92.570782 7.798226
timestep 788.600757 821.932399 800.550250 67.438898
state-update 164.618212 175.163768 170.006737 14.321483
state-tmGlut 17.880351 21.164150 19.842612 1.671555
state-tmGabaA 18.690228 21.053429 20.069056 1.690631
state-sk_ms 6.000227 6.400353 6.222362 0.524176
state-gGapPar 0.211716 0.231105 0.219498 0.018491
state-naf_ms 7.638330 8.361350 7.988473 0.672954
state-naf_fs 1.256643 1.364567 1.293139 0.108935
state-kir_ms 5.833877 6.311499 6.148401 0.517945
state-kir_fs 1.566789 1.683058 1.618802 0.136369
state-kdr_ms 5.287438 5.585141 5.397896 0.454722
state-kdr_fs 1.181379 1.311535 1.230932 0.103695
state-kas_ms 7.616615 8.452772 8.098884 0.682256
state-kas_fs 1.282462 1.412272 1.322515 0.111410
state-kaf_ms 8.431140 9.091935 8.805247 0.741760
state-kaf_fs 1.244079 1.366700 1.289862 0.108659
state-cat33_ms 8.925963 9.501701 9.145406 0.770415
state-cat32_ms 8.852230 9.546560 9.114142 0.767782
state-car_ms 8.744904 9.364068 9.000629 0.758219
state-caq_fs 1.167949 1.264045 1.204005 0.101426
state-can_ms 1.313321 1.425241 1.342329 0.113079
state-cal13_ms 8.926063 9.651054 9.359504 0.788451
state-cal12_ms 8.776668 9.629176 9.255009 0.779648
state-bk_ms 6.559225 7.040436 6.863340 0.578172
state-bk_fs 1.253371 1.371389 1.297992 0.109344
state-cadyn_fs 1.244077 1.360497 1.289854 0.108658
state-cadyn_ms 7.993786 8.283918 8.156555 0.687114
state-caldyn_ms 7.974183 8.223518 8.089551 0.681469
state-pas 0.242036 0.264529 0.250038 0.021063
update 5.672399 5.888646 5.769049 0.485989
second_order_cur 0.215864 0.239019 0.224698 0.018929
matrix-solver 46.665604 48.344990 47.195290 3.975763
setup_tree_matrix 284.373747 292.540055 288.379657 24.293298
cur-tmGlut 47.796660 50.439045 49.717282 4.188218
cur-tmGabaA 27.026846 27.746684 27.412058 2.309210
cur-sk_ms 11.592638 11.858541 11.693581 0.985075
cur-gGapPar 1.595394 1.747551 1.645170 0.138590
cur-naf_ms 12.283355 13.308238 12.901631 1.086842
cur-naf_fs 1.289727 1.408634 1.323691 0.111509
cur-kir_ms 12.141881 12.471281 12.275892 1.034129
cur-kir_fs 2.031832 2.172042 2.070867 0.174451
cur-kdr_ms 10.819164 11.101391 10.921307 0.920018
cur-kdr_fs 1.269751 1.400391 1.318768 0.111094
cur-kas_ms 12.963466 13.412060 13.127236 1.105847
cur-kas_fs 1.303103 1.438943 1.344530 0.113264
cur-kaf_ms 12.907275 13.207011 13.036136 1.098173
cur-kaf_fs 1.300288 1.434272 1.351428 0.113845
cur-cat33_ms 12.352017 12.797276 12.598672 1.061321
cur-cat32_ms 12.374102 12.901329 12.588384 1.060454
cur-car_ms 13.336423 14.319290 13.998113 1.179211
cur-caq_fs 1.357181 1.463727 1.390305 0.117120
cur-can_ms 1.421563 1.556975 1.471210 0.123936
cur-cal13_ms 14.101366 14.347143 14.205350 1.196668
cur-cal12_ms 14.098088 14.430896 14.266691 1.201836
cur-bk_ms 11.719685 12.056659 11.847698 0.998058
cur-bk_fs 1.305406 1.419977 1.358138 0.114410
cur-cadyn_fs 1.113127 1.209888 1.149567 0.096840
cur-cadyn_ms 1.584393 1.720349 1.639903 0.138147
cur-caldyn_ms 1.608654 1.721550 1.659429 0.139791
cur-cal_ion 3.873672 4.067739 3.970980 0.334518
cur-ca_ion 3.870369 4.084463 3.968295 0.334292
cur-k_ion 2.011889 2.141438 2.067910 0.174202
cur-na_ion 1.967243 2.098818 2.027355 0.170786
cur-pas 6.587077 6.816442 6.676041 0.562394
deliver_events 278.642203 297.546088 285.308842 24.034611
nrn_deliver_events 121.953930 130.628961 124.658322 10.501302
netbuf_receive_device 3.493318 3.726540 3.576826 0.301314
transfer_netbuf_host2device 37.214977 39.304264 37.771478 3.181895
update_net_receive_buffer 36.434771 38.457027 36.964274 3.113896
acc_update_device 9.505350 10.022741 9.699914 0.817127
net_receive_buffer_order_refactor 25.575383 27.018933 25.938048 2.185039
cvode_instance_deliver_events 79.686780 85.826019 81.679789 6.880761
deliver_net_events 155.102212 165.233591 159.007060 13.394863
netbuf_receive_device 3.403401 3.615408 3.513958 0.296018
transfer_netbuf_host2device 36.874451 39.099152 37.558387 3.163944
update_net_receive_buffer 36.089584 38.245614 36.745719 3.095484
acc_update_device 9.482481 10.057157 9.717243 0.818587
net_receive_buffer_order_refactor 25.269030 26.754816 25.696415 2.164683
deque_deliver_host 78.568418 84.096959 80.689731 6.797358
nrn_multisend_advance_host 0.424282 0.446943 0.431616 0.036360
get_watch_host 0.212828 0.230211 0.218959 0.018445
send_presyn_host 28.367523 30.502133 29.631212 2.496153
transfer_spike_deveice2host 1.087129 1.202950 1.137972 0.095864
collect_spike_device 2.784141 3.025850 2.944873 0.248078
finitialize 6.687069 6.687169 6.687120 0.563328
spike-exchange 0.000130 0.406356 0.203937 0.017180
communication 0.000043 0.000068 0.000054 0.000005
imbalance 0.000058 0.406271 0.203865 0.017174
cur-tmGlut 0.001343 0.001421 0.001399 0.000118
cur-tmGabaA 0.000745 0.000759 0.000752 0.000063
cur-sk_ms 0.000296 0.000299 0.000297 0.000025
cur-gGapPar 0.000041 0.000044 0.000042 0.000004
cur-naf_ms 0.000308 0.000331 0.000326 0.000027
cur-naf_fs 0.000034 0.000036 0.000035 0.000003
cur-kir_ms 0.000310 0.000318 0.000313 0.000026
cur-kir_fs 0.000051 0.000055 0.000053 0.000004
cur-kdr_ms 0.000278 0.000282 0.000279 0.000024
cur-kdr_fs 0.000032 0.000037 0.000035 0.000003
cur-kas_ms 0.000331 0.000340 0.000335 0.000028
cur-kas_fs 0.000034 0.000036 0.000035 0.000003
cur-kaf_ms 0.000329 0.000337 0.000333 0.000028
cur-kaf_fs 0.000034 0.000036 0.000035 0.000003
cur-cat33_ms 0.000319 0.000330 0.000324 0.000027
cur-cat32_ms 0.000319 0.000329 0.000324 0.000027
cur-car_ms 0.000341 0.000368 0.000361 0.000030
cur-caq_fs 0.000036 0.000039 0.000037 0.000003
cur-can_ms 0.000037 0.000039 0.000038 0.000003
cur-cal13_ms 0.000364 0.000370 0.000366 0.000031
cur-cal12_ms 0.000365 0.000373 0.000368 0.000031
cur-bk_ms 0.000298 0.000301 0.000299 0.000025
cur-bk_fs 0.000034 0.000036 0.000035 0.000003
cur-cadyn_fs 0.000027 0.000031 0.000029 0.000002
cur-cadyn_ms 0.000042 0.000044 0.000044 0.000004
cur-caldyn_ms 0.000043 0.000047 0.000045 0.000004
cur-cal_ion 0.000098 0.000103 0.000101 0.000008
cur-ca_ion 0.000097 0.000102 0.000098 0.000008
cur-k_ion 0.000049 0.000051 0.000050 0.000004
cur-na_ion 0.000051 0.000054 0.000052 0.000004
cur-pas 0.000175 0.000179 0.000176 0.000015
nrn_deliver_events 0.000562 0.000625 0.000597 0.000050
netbuf_receive_device 0.000283 0.000323 0.000305 0.000026
transfer_netbuf_host2device 0.000090 0.000093 0.000092 0.000008
update_net_receive_buffer 0.000033 0.000036 0.000035 0.000003
cvode_instance_deliver_events 0.000036 0.000043 0.000039 0.000003
load-model 58.910579 60.469316 59.792065 5.036924
Note that "transfer_netbuf_host2device" refers to the data transfer of update_net_receive_buffer(nt). "deque_deliver_host", "cvode_instance_deliver_events" and "nrn_spike_exchange_send" refer to the deque and netcon send( aka *corenrn.get_pnt_receive()[typ]) procedure). Spike_exchange_send and deliver_events consumes 43% of time in total.
@pramodk @nrnhines So could you please show me some hints on the following questions?
- If I want to let the netcons that is going to fire in a timestep fire (aka calling the *corenrn.get_pnt_receive()[typ]) in parallel on GPU, and eliminate the data transfer of update_net_receive_buffer(nt), what should I do? The mod files are auto-generated, and do I have to modify the generated mod c codes on my self? Could you give me some advices on how to realize it?
- It appears that there are two functions, deliver_net_events and nrn_deliver_events, that are related to the event delivery. But when I dig into the code, only the deliver_net_events function would call (input)presyn->send and insert the netcons into the priority queue. So what does nrn_deliver_events do? Why is it seperated from the deliver_net_events?
- In update_net_receive_buffer, what does the net_receive_buffer_order_refactor do? The annotation says "instance order to avoid race", what does that mean?
I plan to work on this issue tomorrow. I will take your provided dataset, run on our machine and respond to above questions. Sorry for delays.
- Note that
deliver_net_events(nth)
is called on entry tonrn_fixed_step_thread(NrnThread* nth)
to check thresholds and deliver all (including binqueue) events up to tentry+dt/2. Whereas nrn_deliver_events(nth) is called on exit to deliver all except binqueue events up to but not past texit (tentry + dt). The issue that is trying to be resolved are the cases of events generated in a time step that need to be delivered during that time step. These are generally SelfEvents from NET_RECEIVE net_send(...) calls (often 0 delay) and not NetCon events which can only be 0 delay if src and target are in same thread. - It is going to take some study on my part to comment that code adequately. My working hypothesis (because of the priority queue in
net_receive_buffer_order
) is that without it there were assertion errors in the NET_RECEIVE block due to events arriving with a delivery time earlier than the previous event. As @pramodk was also involved with this code, he may also be able to enter the discussion.
@nrnhines
These are generally SelfEvents from NET_RECEIVE net_send(...) calls (often 0 delay) and not NetCon events which can only be 0 delay if src and target are in same thread.
So could I safely assume that no netcons will be delivered in nrn_deliver_events(nth), and all the netcons are delivered in deliver_net_events(nth)?
Besides, could you give me some hints on question 1? Could I achieve that without rewriting too many codes?
@HolyLow : I forgot to update this ticket last week:
@nrnhines and myself went through this issue last week and looked at profile numbers you posted.
If I am not mistaken, you have done some additional instrumentation of functions. I can guess what are possible instrumentations you did but just to be sure I was wondering if could also past git diff
(or link to the code / fork if you have its on GitHub).
We looked into update_net_receive_buffer and net_receive_buffer_order. There are some performance fixes we would like to experiment with. Instead of running full model (which is time consuming), Michael proposed to create a standalone test that will reproduce this performance issue and then it could be easily tested.
We will try this next week and will update this ticket.