nest-simulator
nest-simulator copied to clipboard
Segmentation faults with Structural Plasticity in NEST v3.6 onwards
This issue has been opened in reference to a mailing list post. Details about the original post can be found here
Using structural plasticity (SP) with MPI-based simulations leads to spontaneous crashes in NESTv3.6 onward
To Reproduce Steps to reproduce the behavior:
- Create an MPI-based script that demonstrates structural plasticity.
- alternatively, use minimal.py
- Run the script with 32 or more MPI processes
- fewer MPI processes can also generate a segmentation fault
Expected behavior The simulation will crash hinting that a segmentation fault has occurred.
- The strerr dump from
minimal.pyon NESTv3.6 with 32 MPI processes is shown below:
[jsfc114:24182:0:24182] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7473)
==== backtrace (tid: 24182) ====
0 0x000000000003e6f0 __GI___sigaction() :0
1 0x0000000000655387 nest::Connector<nest::static_synapse<nest::TargetIdentifierPtrRport> >::send() ???:0
2 0x000000000043cfd6 nest::EventDeliveryManager::deliver_events_<nest::SpikeData>() event_delivery_manager.cpp:0
3 0x000000000043f29f nest::EventDeliveryManager::deliver_events() ???:0
4 0x000000000040abaa nest::SimulationManager::update_() simulation_manager.cpp:0
5 0x00000000000156e6 GOMP_parallel() /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:178
6 0x00000000000156e6 GOMP_parallel_end() /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:140
7 0x00000000000156e6 GOMP_parallel() /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:179
8 0x000000000040c067 nest::SimulationManager::update_() ???:0
9 0x000000000040c96c nest::SimulationManager::call_update_() ???:0
10 0x0000000000411129 nest::SimulationManager::run() ???:0
11 0x00000000003f5d7d nest::run() ???:0
12 0x00000000003f5e51 nest::simulate() ???:0
13 0x00000000003b1836 nest::NestModule::SimulateFunction::execute() ???:0
14 0x00000000000bac21 SLIInterpreter::execute_() interpret.cc:0
15 0x0000000000030d04 __pyx_pw_12pynestkernel_10NESTEngine_9run() pynestkernel.cxx:0
16 0x00000000001d5e9c _PyEval_EvalFrameDefault() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5225
17 0x00000000001d5e9c _PyEval_EvalFrameDefault() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5226
18 0x00000000001ce50a _PyEval_EvalFrame() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/./Include/internal/pycore_ceval.h:73
19 0x00000000001ce50a _PyEval_Vector() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:6443
20 0x00000000001d6c3a _PyEval_EvalFrameDefault() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5380
21 0x00000000001ce50a _PyEval_EvalFrame() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/./Include/internal/pycore_ceval.h:73
22 0x00000000001ce50a _PyEval_Vector() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:6443
23 0x00000000002562e1 PyEval_EvalCode() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:1154
24 0x0000000000273443 run_eval_code_obj() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1714
25 0x000000000026fbaa run_mod() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1735
26 0x00000000002851e1 pyrun_file() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1630
27 0x0000000000284054 _PyRun_SimpleFileObject() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:440
28 0x0000000000283c24 _PyRun_AnyFileObject() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:79
29 0x000000000027df4c pymain_run_file_obj() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:360
30 0x000000000027df4c pymain_run_file() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:379
31 0x000000000027df4c pymain_run_python() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:601
32 0x000000000027df4c Py_RunMain() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:680
33 0x0000000000246c67 Py_BytesMain() /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:734
34 0x0000000000029590 __libc_start_call_main() ???:0
35 0x0000000000029640 __libc_start_main_alias_2() :0
36 0x000000000040106e _start() ???:0
=================================
<PSP:r0000028:Backtrace after SIGSEGV (Invalid memory reference):>
<PSP:r0000028:# 0: /p/software/jusuf/stages/2024/software/pscom/5-default-GCCcore-12.3.0/lib/libpscom.so.2(+0xb4e4) [0x1529ccad14e4]>
<PSP:r0000028:# 1: /usr/lib64/libc.so.6(+0x3e6f0) [0x152a4963e6f0]>
<PSP:r0000028:# 2: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest9ConnectorINS_14static_synapseINS_24TargetIdentifierPtrRportEEEE4sendEmmRKSt6vectorIPNS_14ConnectorModelESaIS7_EERNS_5EventE+0x87) [0x152a3bc69387]>
<PSP:r0000028:# 3: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(+0x43cfd6) [0x152a3ba50fd6]>
<PSP:r0000028:# 4: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest20EventDeliveryManager14deliver_eventsEm+0x6f) [0x152a3ba5329f]>
<PSP:r0000028:# 5: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(+0x40abaa) [0x152a3ba1ebaa]>
<PSP:r0000028:# 6: /p/software/jusuf/stages/2024/software/GCCcore/12.3.0/lib64/libgomp.so.1(GOMP_parallel+0x46) [0x152a406b06e6]>
<PSP:r0000028:# 7: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager7update_Ev+0x197) [0x152a3ba20067]>
<PSP:r0000028:# 8: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager12call_update_Ev+0x5dc) [0x152a3ba2096c]>
<PSP:r0000028:# 9: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager3runERKNS_4TimeE+0x339) [0x152a3ba25129]>
<PSP:r0000028:#10: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest3runERKd+0x9d) [0x152a3ba09d7d]>
<PSP:r0000028:#11: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest8simulateERKd+0x11) [0x152a3ba09e51]>
<PSP:r0000028:#12: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZNK4nest10NestModule16SimulateFunction7executeEP14SLIInterpreter+0x36) [0x152a3b9c5836]>
<PSP:r0000028:#13: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libsli.so.3(_ZN14SLIInterpreter8execute_Em+0x201) [0x152a3b041c21]>
<PSP:r0000028:#14: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/pynestkernel.so(+0x30d04) [0x152a3c1cfd04]>
<PSP:r0000028:#15: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x41bc) [0x152a49c2ce9c]>
<PSP:r0000028:#16: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(+0x1ce50a) [0x152a49c2550a]>
<PSP:r0000028:#17: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x4f5a) [0x152a49c2dc3a]>
<PSP:r0000028:#18: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(+0x1ce50a) [0x152a49c2550a]>
<PSP:r0000028:#19: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(PyEval_EvalCode+0xa1) [0x152a49cad2e1]>
readFromPMIClient: lost connection to the PMI client
kvsprovider[23316]: releaseMySelf: wrong message type 3 (PSP_CD_CLIENTREFUSED)
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
srun: error: jsfc114: tasks 0-27,29-31: Terminated
srun: error: jsfc114: task 28: Exited with exit code 1
srun: Force Terminated StepId=659919.0
Desktop/Environment (please complete the following information):
@neuroady For further debugging of this issue, if it is not solved by the solution for #3489, you may want to compile NEST with the following CMake flags:
- GCC:
-Dwith-debug="-O0 -g -D_GLIBCXX_ASSERTIONS" - Clang:
-
-Dwith-debug="-O0 -g -fsanitize=bounds" -
-Dwith-debug="-O0 -g -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE"
-
They add bounds checks to C++ vectors and the like, even where one only uses []. The second one for Clang seems useful only if one runs PyNEST with the lldb debugger, but then it stops very nicely where things go wrong.