nest.Connect and mpi
Describe the bug
When I try to connect neurons only within a single mpi rank with nest.Connect, the next call to nest.Simulate will block.
To Reproduce
Run mpirun -n 2 python3 nest-bug.py
Program: nest-bug.py
import nest
if __name__ == '__main__':
number_ranks = nest.NumProcesses()
my_rank_nest = nest.Rank()
print("I am rank", my_rank_nest, "out of", number_ranks)
nest.ResetKernel()
nest.total_num_virtual_procs = 2
nest.local_num_threads = 1
nest.set_verbosity("M_DEBUG")
nodes_e = nest.Create(
"iaf_psc_alpha", 10
)
local_neurons = [gid.tolist()[0] for gid, status in zip(nodes_e, nest.GetStatus(nodes_e, "local")) if status]
print("Neurons on rank", my_rank_nest, ": ", local_neurons)
# Form a synapse between two neurons both on rank 0
nest.Connect(nest.NodeCollection([4]), nest.NodeCollection([2]), conn_spec="one_to_one", syn_spec={"weight": float(15.), "delay": 1.0})
# Form a synapse between two neurons both on rank 1
# nest.Connect([3], [1], syn_spec={"weight": 15.0, "delay": 1.0})
conns = nest.GetConnections(source=nest.NodeCollection([4]), target=nest.NodeCollection([2]))
print("Result on rank", my_rank_nest, "for 4 -> 2", conns)
#conns = nest.GetConnections(source=nest.NodeCollection([3]), target=nest.NodeCollection([1]))
#print("Result on rank", my_rank_nest, "for 3 -> 1", conns)
nest.Simulate(1.0)
print("Simulation completed")
Run mpirun -n 2 python nest-bug.py will results The output can be found below. When I uncomment he second nest.Connect line the program will run without an exception.
Output with blocking behavior:
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: 3.8.0
Built: Apr 28 2025 13:11:24
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
I am rank 0 out of 2
I am rank 1 out of 2
Neurons on rank 1 : [1, 3, 5, 7, 9]
Neurons on rank 0 : [2, 4, 6, 8, 10]
Result on rank 1 for 4 -> 2 The synapse collection does not contain any connections.
Apr 29 11:05:09 NodeManager::prepare_nodes [Info]:
Preparing 5 nodes for simulation.
Result on rank 0 for 4 -> 2 source target synapse model weight delay
-------- -------- --------------- -------- -------
4 2 static_synapse 15.00 1.000
Apr 29 11:05:09 NodeManager::prepare_nodes [Info]:
Preparing 5 nodes for simulation.
Apr 29 11:05:09 SimulationManager::start_updating_ [Info]:
Number of local nodes: 5
Simulation time (ms): 1
Number of OpenMP threads: 1
Number of MPI processes: 2
Apr 29 11:05:09 SimulationManager::start_updating_ [Info]:
Number of local nodes: 5
Simulation time (ms): 1
Number of OpenMP threads: 1
Number of MPI processes: 2
Blocks forever
Output with the second nest.Connect uncommented:
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: 3.8.0
Built: Apr 28 2025 13:11:24
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
I am rank 0 out of 2
I am rank 1 out of 2
Neurons on rank 1 : [1, 3, 5, 7, 9]
Neurons on rank 0 : [2, 4, 6, 8, 10]
Result on rank 1 for 4 -> 2 The synapse collection does not contain any connections.
Result on rank 0 for 4 -> 2 source target synapse model weight delay
-------- -------- --------------- -------- -------
4 2 static_synapse 15.00 1.000
Result on rank 1 for 3 -> 1 source target synapse model weight delay
-------- -------- --------------- -------- -------
3 1 static_synapse 15.00 1.000
Result on rank 0 for 3 -> 1 The synapse collection does not contain any connections.
Apr 29 11:06:06 NodeManager::prepare_nodes [Info]:
Preparing 5 nodes for simulation.
Apr 29 11:06:06 NodeManager::prepare_nodes [Info]:
Preparing 5 nodes for simulation.
Apr 29 11:06:06 SimulationManager::start_updating_ [Info]:
Number of local nodes: 5
Simulation time (ms): 1
Number of OpenMP threads: 1
Number of MPI processes: 2
Apr 29 11:06:06 SimulationManager::run [Info]:
Simulation finished.
Apr 29 11:06:06 SimulationManager::start_updating_ [Info]:
Number of local nodes: 5
Simulation time (ms): 1
Number of OpenMP threads: 1
Number of MPI processes: 2
Apr 29 11:06:06 SimulationManager::run [Info]:
Simulation finished.
Simulation completed
Simulation completed
Expected behavior I would expect the program to finish.
Desktop/Environment (please complete the following information):
- OS: Ubuntu 24.04.2
- Shell: zsh
- Python-Version: Python 3.12.3
- NEST-Version: nest-3.8
- Installation: cmake with mpi CMake output:
cmake -Dwith-mpi=ON -DMPI_CXX_COMPILER=/opt/openmpi/bin/mpicxx -DMPI_C_COMPILER=/opt/openmpi/bin/mpicc -DCMAKE_INSTALL_PREFIX=/home/marvin/dev/nest38_env -DCMAKE_PREFIX_PATH=/opt/openmpi -DMPIEXEC_EXECUTABLE=/opt/openmpi/bin/mpirun -DCMAKE_CXX_COMPILER=mpicxx ../nest-simulator-3.8
-- The CXX compiler identification is GNU 13.3.0
-- The C compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/openmpi/bin/mpicxx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for include file inttypes.h
-- Looking for include file inttypes.h - found
-- Looking for include file mach-o/dyld.h
-- Looking for include file mach-o/dyld.h - not found
-- Looking for include file mach/mach.h
-- Looking for include file mach/mach.h - not found
-- Looking for include file memory.h
-- Looking for include file memory.h - found
-- Looking for include file stdint.h
-- Looking for include file stdint.h - found
-- Looking for include file sys/types.h
-- Looking for include file sys/types.h - found
-- Looking for C++ include istream
-- Looking for C++ include istream - found
-- Looking for C++ include ostream
-- Looking for C++ include ostream - found
-- Looking for C++ include sstream
-- Looking for C++ include sstream - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of long long
-- Check size of long long - done
-- Check size of u_int16_t
-- Check size of u_int16_t - done
-- Check size of uint16_t
-- Check size of uint16_t - done
-- Check size of u_int64_t
-- Check size of u_int64_t - done
-- Check size of uint64_t
-- Check size of uint64_t - done
-- Looking for NAN
-- Looking for NAN - found
-- Looking for isnan
-- Looking for isnan - found
-- Looking for M_E
-- Looking for M_E - found
-- Looking for M_PI
-- Looking for M_PI - found
-- Looking for expm1
-- Looking for expm1 - not found
-- Info: Host triple: x86_64-pc-linux
-- Info: Target triple: x86_64-pc-linux
-- Found Python: /home/marvin/dev/nest38_env/bin/python3 (found suitable version "3.12.3", minimum required is "3.8") found components: Interpreter Development.Module
-- Found Cython: /home/marvin/.local/bin/cython (found suitable version "3.0.8", minimum required is "0.28.3")
-- Found LTDL: /usr/lib/x86_64-linux-gnu/libltdl.so (found version "2.4.7")
-- Found Readline: /usr/lib/x86_64-linux-gnu/libreadline.so (found version "8.2")
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1")
-- Found GSL: /usr/include (found version "2.7.1")
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found MPI_C: /opt/openmpi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/openmpi/bin/mpicxx (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Could NOT find PY_mpi4py (missing: PY_MPI4PY)
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found suitable version "1.83.0", minimum required is "1.69.0")
-- Info: Check the abort exitcode.
-- Info: Check the abort exitcode. 134
-- Info: Check the segmentation fault exitcode.
-- Info: Check the segmentation fault exitcode. 139
-- Info: Check whether the compiler ignores cmath makros.
-- Info: Check whether the compiler ignores cmath makros. OFF
-- Info: Check whether the compiler does NOT include <*.h> headers ISO conformant.
-- Info: Check whether the compiler does NOT include <*.h> headers ISO conformant. OFF
-- Info: Check whether the compiler respects symbolic signal names in signal.h.
-- Info: Check whether the compiler respects symbolic signal names in signal.h. OFF
-- Info: Check static template member declaration.
-- Info: Check static template member declaration. OFF
-- Info: Check for STL vector capacity base unity.
-- Info: Check for STL vector capacity base unity. ON
-- Info: Check for STL vector capacity doubling strategy.
-- Info: Check for STL vector capacity doubling strategy. ON
-- Info: Check whether the compiler fails with ICE.
-- Info: Check whether the compiler fails with ICE. OFF
-- Info: Check if ::nan is available from cmath.
-- Info: Check if ::nan is available from cmath. ON
-- Info: Check if ::isnan is available from cmath.
-- Info: Check if ::isnan is available from cmath. ON
-- Info: Check if Random123 generators work.
-- Info: Check if Random123 generators work. ON
-- Info: Done configuring NEST version: 3.8.0
--------------------------------------------------------------------------------
NEST Configuration Summary
--------------------------------------------------------------------------------
Target System : Linux
Cross Compiling : FALSE
C compiler : GNU 13.3.0 (/usr/bin/cc)
C compiler flags : -Wall -fopenmp -O2 -fdiagnostics-color=auto
C++ compiler : GNU 13.3.0 (/opt/openmpi/bin/mpicxx)
C++ compiler flags : -std=c++17 -Wall -fopenmp -O2 -fdiagnostics-color=auto
Build dynamic : ON
Built-in modelset : full
Python bindings : Yes (Python 3.12.3: /home/marvin/dev/nest38_env/bin/python3)
Includes : /usr/include/python3.12
Libraries :
Cython : Yes (Cython 3.0.8: /home/marvin/.local/bin/cython)
MPI4Py : No
Documentation : No
Use threading : Yes (OpenMP: -fopenmp)
Libraries : /usr/lib/gcc/x86_64-linux-gnu/13/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.a
Use GSL : Yes (GSL 2.7.1)
Includes : /usr/include
Libraries : /usr/lib/x86_64-linux-gnu/libgsl.so;/usr/lib/x86_64-linux-gnu/libgslcblas.so
Use Readline : Yes (GNU Readline 8.2)
Includes : /usr/include
Libraries : /usr/lib/x86_64-linux-gnu/libreadline.so;/usr/lib/x86_64-linux-gnu/libncurses.so
Use libltdl : Yes (LTDL 2.4.7)
Includes : /usr/include
Libraries : /usr/lib/x86_64-linux-gnu/libltdl.so
Use MPI : Yes (MPI: /opt/openmpi/bin/mpicxx)
Includes :
Libraries :
Launcher : /opt/openmpi/bin/mpirun -n <np> <prog> <args>
Detailed timers : No
Use MUSIC : No
Use libneurosim : No
Use Boost : Yes (Boost 1.83.0)
Includes : /usr/include
Libraries :
Use SIONlib : No
Use HDF5 : No
For details on setting specific flags for your MPI launcher command, see the
CMake documentation at https://cmake.org/cmake/help/latest/module/FindMPI.html
--------------------------------------------------------------------------------
The NEST executable will be installed to:
/home/marvin/dev/nest38_env/bin/
NEST dynamic libraries and user modules will be installed to:
/home/marvin/dev/nest38_env/lib/nest/
PyNEST will be installed to:
/home/marvin/dev/nest38_env/lib/python3.12/site-packages
To set necessary environment variables, add the following line
to your ~/.bashrc :
source /home/marvin/dev/nest38_env/bin/nest_vars.sh
--------------------------------------------------------------------------------
You can now build and install NEST with
make
make install
make installcheck
If you experience problems with the installation or the use of NEST,
please see https://www.nest-simulator.org/frequently_asked_questions
or go to https://www.nest-simulator.org/community to find out how to
join the user mailing list.
-- Configuring done (15.4s)
-- Generating done (0.1s)
-- Build files have been written to: /home/marvin/dev/nest38-build2
Hi @marvinKaster!
Thanks for reporting this. I can reproduce this under macOS 15.4. A first analysis indicates that one MPI rank is waiting at an MPI_Barrier() in
https://github.com/nest/nest-simulator/blob/e57984bcb19bfbeefad3096f689b67f77c772bb1/nestkernel/simulation_manager.cpp#L729
while the other rank is trying to communicate spikes at
https://github.com/nest/nest-simulator/blob/e57984bcb19bfbeefad3096f689b67f77c772bb1/nestkernel/event_delivery_manager.cpp#L370
I cannot right away see precisely where it blocks in that function (would require rebuilding NEST without any function inlining).
I fear that because one rank has no connections at all, it does not enter spike gathering at all and thus the ranks become desynchronized.
We need to fix this, but I think the work-around is reasonably straightforward: use MPI only if you have at least one connection per rank.
Hi @heplesser,
Thank you for the quick reply! I will make sure to have at least 1 connection per rank as long as it is not fixed.
I tried having at least 1 connection per rank by adding poisson background activity to all neurons but I am still facing some issues. After multiple calls to nest.Connect, nest.Disconnect, and nest.Simulate, I get a segmentation fault. Unfortunately, I couldnt track down what the exact pattern is that is causing the issue but I managed to find a sequence of function calls that produce the error.
@heplesser Do you have an idea what the problem could be?
import nest
if __name__ == '__main__':
number_ranks = nest.NumProcesses()
my_rank_nest = nest.Rank()
number_neurons = 80
print("I am rank", my_rank_nest, "out of", number_ranks)
nest.ResetKernel()
nest.set_verbosity("M_DEBUG")
nest.total_num_virtual_procs = 8
nest.local_num_threads = 1
nodes_e = nest.Create(
"iaf_psc_alpha", number_neurons
)
local_neurons = [gid.tolist()[0] for gid, status in zip(nodes_e, nest.GetStatus(nodes_e, "local")) if status]
print("Neurons on rank", my_rank_nest, ": ", local_neurons)
poisson_gen = nest.Create("poisson_generator", params={"rate": 200.0})
nest.Connect(poisson_gen, nodes_e, syn_spec={"weight": 400.0, "delay": 1.0})
for src, tgt in [(48, 21), (43, 53), (61, 33), (73, 52), (33, 20), (42, 24), (23, 75), (57, 39), (49, 31), (35, 58), (69, 29), (57, 23), (46, 59), (56, 36), (56, 37)]:
nest.Connect(nest.NodeCollection([src]), nest.NodeCollection([tgt]), conn_spec="one_to_one", syn_spec={"weight": float(15.), "delay": 1.0})
for src, tgt in [(69, 29), (43, 53), (49, 31), (35, 58), (56, 37), (42, 24), (61, 33), (48, 21), (73, 52), (23, 75), (33, 20), (56, 36)]:
nest.Disconnect(nest.NodeCollection([src]), nest.NodeCollection([tgt]))
nest.Simulate(1000.0)
for src, tgt in [(28, 18), (55, 76), (41, 31), (76, 47), (29, 75), (59, 77), (53, 75), (25, 48), (36, 21), (40, 66), (30, 74), (41, 70), (70, 19), (47, 78), (73, 45), (38, 41), (56, 37), (28, 49), (60, 39), (48, 37)]:
nest.Connect(nest.NodeCollection([src]), nest.NodeCollection([tgt]), conn_spec="one_to_one", syn_spec={"weight": float(15.), "delay": 1.0})
for src, tgt in [(40, 66), (36, 21)]:
nest.Disconnect(nest.NodeCollection([src]), nest.NodeCollection([tgt]))
nest.Simulate(1000.0)
for src, tgt in [(44, 47), (72, 70)]:
nest.Connect(nest.NodeCollection([src]), nest.NodeCollection([tgt]), conn_spec="one_to_one", syn_spec={"weight": float(15.), "delay": 1.0})
for src, tgt in [(46, 59), (76, 47), (57, 39), (41, 70), (28, 18), (53, 75), (60, 39), (38, 41), (47, 78), (55, 76), (28, 49), (30, 74), (25, 48), (48, 37), (57, 23), (41, 31)]:
nest.Disconnect(nest.NodeCollection([src]), nest.NodeCollection([tgt]))
nest.Simulate(1000.0)
print("Simulation completed")
Running it with mpirun -n 8 python3 bug2.py produces the following output:
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: 3.8.0
Built: Apr 28 2025 13:11:24
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
I am rank 0 out of 8
I am rank 2 out of 8
I am rank 1 out of 8
I am rank 4 out of 8
I am rank 7 out of 8
I am rank 3 out of 8
Neurons on rank 0 : [8, 16, 24, 32, 40, 48, 56, 64, 72, 80]
Neurons on rank 2 : [2, 10, 18, 26, 34, 42, 50, 58, 66, 74]
Neurons on rank 1 : [1, 9, 17, 25, 33, 41, 49, 57, 65, 73]
Neurons on rank 4 : [4, 12, 20, 28, 36, 44, 52, 60, 68, 76]
I am rank 6 out of 8
I am rank 5 out of 8
Neurons on rank 3 : [3, 11, 19, 27, 35, 43, 51, 59, 67, 75]
Neurons on rank 7 : [7, 15, 23, 31, 39, 47, 55, 63, 71, 79]
Neurons on rank 6 : [6, 14, 22, 30, 38, 46, 54, 62, 70, 78]
Neurons on rank 5 : [5, 13, 21, 29, 37, 45, 53, 61, 69, 77]
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 SimulationManager::run [Info]:
Simulation finished.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 NodeManager::prepare_nodes [Info]:
Preparing 11 nodes for simulation.
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
Apr 30 13:48:14 SimulationManager::start_updating_ [Info]:
Number of local nodes: 11
Simulation time (ms): 1000
Number of OpenMP threads: 1
Number of MPI processes: 8
[marvin-ThinkPad-T14-Gen-1:567938] *** Process received signal ***
[marvin-ThinkPad-T14-Gen-1:567938] Signal: Segmentation fault (11)
[marvin-ThinkPad-T14-Gen-1:567938] Signal code: Address not mapped (1)
[marvin-ThinkPad-T14-Gen-1:567938] Failing at address: (nil)
[marvin-ThinkPad-T14-Gen-1:567938] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7ef28dc45330]
[marvin-ThinkPad-T14-Gen-1:567938] [ 1] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(+0x48f836)[0x7ef28ca8f836]
[marvin-ThinkPad-T14-Gen-1:567938] [ 2] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest20EventDeliveryManager14deliver_eventsEm+0x6f)[0x7ef28ca90b3f]
[marvin-ThinkPad-T14-Gen-1:567938] [ 3] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(+0x457649)[0x7ef28ca57649]
[marvin-ThinkPad-T14-Gen-1:567938] [ 4] /lib/x86_64-linux-gnu/libgomp.so.1(GOMP_parallel+0x47)[0x7ef28c3b4977]
[marvin-ThinkPad-T14-Gen-1:567938] [ 5] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager7update_Ev+0x182)[0x7ef28ca552f2]
[marvin-ThinkPad-T14-Gen-1:567938] [ 6] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager12call_update_Ev+0x633)[0x7ef28ca55de3]
[marvin-ThinkPad-T14-Gen-1:567938] [ 7] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager3runERKNS_4TimeE+0x3a6)[0x7ef28ca5bc36]
[marvin-ThinkPad-T14-Gen-1:567938] [ 8] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest3runERKd+0xdf)[0x7ef28ca4352f]
[marvin-ThinkPad-T14-Gen-1:567938] [ 9] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest8simulateERKd+0x15)[0x7ef28ca435c5]
[marvin-ThinkPad-T14-Gen-1:567938] [10] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZNK4nest10NestModule16SimulateFunction7executeEP14SLIInterpreter+0x47)[0x7ef28ca0f197]
[marvin-ThinkPad-T14-Gen-1:567938] [11] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/../../../nest/libsli.so.3(_ZN14SLIInterpreter8execute_Em+0x2b2)[0x7ef28c538622]
[marvin-ThinkPad-T14-Gen-1:567938] [12] /home/marvin/dev/nest38_env/lib/python3.12/site-packages/nest/pynestkernel.so(+0x30976)[0x7ef28d3be976]
[marvin-ThinkPad-T14-Gen-1:567938] [13] python3(PyObject_Vectorcall+0x35)[0x549b85]
[marvin-ThinkPad-T14-Gen-1:567938] [14] python3(_PyEval_EvalFrameDefault+0xa89)[0x5d73c9]
[marvin-ThinkPad-T14-Gen-1:567938] [15] python3(PyEval_EvalCode+0x15b)[0x5d58eb]
[marvin-ThinkPad-T14-Gen-1:567938] [16] python3[0x608b42]
[marvin-ThinkPad-T14-Gen-1:567938] [17] python3[0x6b4e93]
[marvin-ThinkPad-T14-Gen-1:567938] [18] python3(_PyRun_SimpleFileObject+0x1aa)[0x6b4bfa]
[marvin-ThinkPad-T14-Gen-1:567938] [19] python3(_PyRun_AnyFileObject+0x4f)[0x6b4a2f]
[marvin-ThinkPad-T14-Gen-1:567938] [20] python3(Py_RunMain+0x3b5)[0x6bca95]
[marvin-ThinkPad-T14-Gen-1:567938] [21] python3(Py_BytesMain+0x2d)[0x6bc57d]
[marvin-ThinkPad-T14-Gen-1:567938] [22] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7ef28dc2a1ca]
[marvin-ThinkPad-T14-Gen-1:567938] [23] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7ef28dc2a28b]
[marvin-ThinkPad-T14-Gen-1:567938] [24] python3(_start+0x25)[0x657ce5]
[marvin-ThinkPad-T14-Gen-1:567938] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 567938 on node marvin-ThinkPad-T14-Gen-1 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Hi @marvinKaster, thanks for reporting this with a compact reproducer. I have put it on the agenda for today's Open NEST Developer Video Conference( @terhorstd). If the time fits for you, you are welcome to join. The meeting is 11.30–12.30 CEST (UTC+2), for details see https://github.com/nest/nest-simulator/wiki/Open-NEST-Developer-Video-Conference.
Issue automatically marked stale!
Hi @marvinKaster! Apologies for the long silence. I have now created a really minimal reproducer that does not even need MPI or any parallelization. Simply sending a spike over a deleted connection causes the segfault:
import nest
nodes_e = nest.Create("parrot_neuron", 3)
sg = nest.Create('spike_generator', params={'spike_times': [9.7]})
nest.Connect(sg, nodes_e[1]) # only neuron with GID 2 gets input
nest.Connect(nodes_e[:2], nodes_e[2]) # neurons with GIDs 1 and 2 connect to 3
# Neuron 2 spikes at 10.7, spike is stored, but not yet delivered
# If we simulated here until 11.1, the spike would be delivered here and all is fine
nest.Simulate(11)
nest.Disconnect(nodes_e[0], nodes_e[2])
nest.Simulate(0.1) # This call will trigger delivery of the spike, but now the connection is gone -> segfault
@neuroady @JanVogelsang @suku248 This issue may be related to #3394. Since we have a very simple reproducer here, this may be allow for easier debugging.
I can confirm that this doesn't occur when compressed spikes are switched off: nest.SetKernelStatus({"use_compressed_spikes": False})
[edit]: Works fine with compressed_spikes enabled in 3.5. But not in 3.6 onwards
I have understood the problem and am working on a solution. The issue is as follows: When we use compression (and compression changed from 3.5 to 3.6), we no longer send information about which neuron spiked, but pretty much a direct index into a nested array of Connection objects on the target side. Now, if, as in the reproducer above, a neuron emits a spike at 10.7 and we have a minimal delay of 1.0, then the spike is transmitted from the pre- to the postsynaptic side at the end of the Simulate(11.0) call above. For the single-threaded case above this means taking the spike out of the emitted_spikes_register_ and write it to the recv_buffer_spike_data_. In that process, we write the index I mentioned into this receive buffer. We then remove the connection through which the spike should travel and the Prepare() hiding behind the Simulate(0.1) call above will trigger a re-build of the compressed connection infrastructure. Only afterwards will update_() call deliver_events(), and now the index based on the old compressed tables is applied to the new tables, which as smaller, thus the segmentation fault.
The solution to this is to make sure that the connection infrastructure is not rebuilt while spikes are in transit. In particular, this means that update_() needs to take care of this rebuilding after spike delivery. This causes some problems for tests (because irrelevant port numbers change during compression and are compared), so I need a little longer to prepare a complete PR.
This makes a lot of sense! Thanks for digging so deep so quickly.
See also #3532.
See also #3533 for an approach to solving this and further discussion of ensuing problems.
Issue automatically marked stale!
@med-ayssar I suspect that this issue might also be related to #3532. Would you be available for a chat at some point on this entire complex?
Hey @heplesser, Sorry, I was (and still) on Vacation until the 20th of October.
I was able to run the provided code snippet with PR #3536, and it worked, without any seg-faults.
I think only the synchronization problem remains to be solved. For an in-depth chat, anything after October 20th should work for me.
Hi @marvinKaster! Apologies for the long silence. I have now created a really minimal reproducer that does not even need MPI or any parallelization. Simply sending a spike over a deleted connection causes the segfault:
import nest
nodes_e = nest.Create("parrot_neuron", 3) sg = nest.Create('spike_generator', params={'spike_times': [9.7]})
nest.Connect(sg, nodes_e[1]) # only neuron with GID 2 gets input nest.Connect(nodes_e[:2], nodes_e[2]) # neurons with GIDs 1 and 2 connect to 3
Neuron 2 spikes at 10.7, spike is stored, but not yet delivered
If we simulated here until 11.1, the spike would be delivered here and all is fine
nest.Simulate(11)
nest.Disconnect(nodes_e[0], nodes_e[2])
nest.Simulate(0.1) # This call will trigger delivery of the spike, but now the connection is gone -> segfault @neuroady @JanVogelsang @suku248 This issue may be related to #3394. Since we have a very simple reproducer here, this may be allow for easier debugging.
Issue automatically marked stale!