nest-simulator icon indicating copy to clipboard operation
nest-simulator copied to clipboard

MUSIC segmentation fault when nodes are created in specific order

Open JanVogelsang opened this issue 1 month ago • 11 comments

The music-cont-proxy test (#3678 ) creates nodes in the following order in test_cont_proxy_sender.py:

mcoproxy = nest.Create("music_cont_out_proxy")

n1 = nest.Create("iaf_cond_exp", params={"I_e": 300.0})
n2 = nest.Create("iaf_cond_exp", params={"I_e": 600.0})

Changing this as follows produces a segmentation fault when executing the test script ( mpiexec -np 3 music test_cont_proxy.music):

n1 = nest.Create("iaf_cond_exp", params={"I_e": 300.0})
n2 = nest.Create("iaf_cond_exp", params={"I_e": 600.0})

mcoproxy = nest.Create("music_cont_out_proxy")

JanVogelsang avatar Dec 01 '25 14:12 JanVogelsang

@JanVogelsang Could you try what happens if, in the working variant, you create some node, e.g., a parrot neuron, before creating the music output proxy? I wonder if it might need to be node 1?

heplesser avatar Dec 01 '25 18:12 heplesser

@JanVogelsang could please post the stack trace of the seg fault.

med-ayssar avatar Dec 02 '25 14:12 med-ayssar

Could not reproduce with the current master.

med-ayssar avatar Dec 06 '25 10:12 med-ayssar

Could not reproduce with the current master.

That's strange, which compiler and OS are you using?

The stracktrace+output is the following:

...
Dec 10 13:36:54 MUSICManager::enter_runtime [Info]: 
    Entering MUSIC runtime with tick = 0.1 ms
[ThinkPad-Jan-Vogelsang:50495] *** Process received signal ***
[ThinkPad-Jan-Vogelsang:50495] Signal: Segmentation fault (11)
[ThinkPad-Jan-Vogelsang:50495] Signal code: Address not mapped (1)
[ThinkPad-Jan-Vogelsang:50495] Failing at address: (nil)
[ThinkPad-Jan-Vogelsang:50496] *** Process received signal ***
[ThinkPad-Jan-Vogelsang:50496] Signal: Segmentation fault (11)
[ThinkPad-Jan-Vogelsang:50496] Signal code: Address not mapped (1)
[ThinkPad-Jan-Vogelsang:50496] Failing at address: (nil)

Dec 10 13:36:54 SimulationManager::start_updating_ [Info]: 
    Number of local nodes: 2
    Simulation time (ms): 20
    Number of OpenMP threads: 1
    Number of MPI processes: 2

Dec 10 13:36:54 SimulationManager::start_updating_ [Info]: 
    Number of local nodes: 2
    Simulation time (ms): 20
    Number of OpenMP threads: 1
    Number of MPI processes: 2
[ThinkPad-Jan-Vogelsang:50496] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7a8a66445330]
[ThinkPad-Jan-Vogelsang:50496] [ 1] [ThinkPad-Jan-Vogelsang:50495] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7fc155a45330]
[ThinkPad-Jan-Vogelsang:50495] [ 1] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest19UniversalDataLoggerINS_12iaf_cond_expEE11DataLogger_11record_dataERKS1_l+0x62)[0x7a8a3accf882]
[ThinkPad-Jan-Vogelsang:50496] [ 2] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest19UniversalDataLoggerINS_12iaf_cond_expEE11DataLogger_11record_dataERKS1_l+0x62)[0x7fc12a2cf882]
[ThinkPad-Jan-Vogelsang:50495] [ 2] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest12iaf_cond_exp6updateERKNS_4TimeEll+0x206)[0x7fc12a2cc8e6]
[ThinkPad-Jan-Vogelsang:50495] [ 3] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest12iaf_cond_exp6updateERKNS_4TimeEll+0x206)[0x7a8a3accc8e6]
[ThinkPad-Jan-Vogelsang:50496] [ 3] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(+0x4c6108)[0x7a8a3a8c6108]
[ThinkPad-Jan-Vogelsang:50496] [ 4] /lib/x86_64-linux-gnu/libgomp.so.1(GOMP_parallel+0x47)[0x7a8a62e5c977]
[ThinkPad-Jan-Vogelsang:50496] [ 5] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(+0x4c6108)[0x7fc129ec6108]
[ThinkPad-Jan-Vogelsang:50495] [ 4] /lib/x86_64-linux-gnu/libgomp.so.1(GOMP_parallel+0x47)[0x7fc15265c977]
[ThinkPad-Jan-Vogelsang:50495] [ 5] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager7update_Ev+0x139)[0x7a8a3a8c7ca9]
[ThinkPad-Jan-Vogelsang:50496] [ 6] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager7update_Ev+0x139)[0x7fc129ec7ca9]
[ThinkPad-Jan-Vogelsang:50495] [ 6] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager12call_update_Ev+0x5ca)[0x7a8a3a8c85ba]
[ThinkPad-Jan-Vogelsang:50496] [ 7] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager12call_update_Ev+0x5ca)[0x7fc129ec85ba]
[ThinkPad-Jan-Vogelsang:50495] [ 7] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager3runERKNS_4TimeE+0x3c7)[0x7a8a3a8cc957]
[ThinkPad-Jan-Vogelsang:50496] [ 8] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager3runERKNS_4TimeE+0x3c7)[0x7fc129ecc957]
[ThinkPad-Jan-Vogelsang:50495] [ 8] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest3runERKd+0xd6)[0x7a8a3a8af0b6]
[ThinkPad-Jan-Vogelsang:50496] [ 9] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest3runERKd+0xd6)[0x7fc129eaf0b6]
[ThinkPad-Jan-Vogelsang:50495] [ 9] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest8simulateERKd+0x15)[0x7a8a3a8af145]
[ThinkPad-Jan-Vogelsang:50496] [10] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest8simulateERKd+0x15)[0x7fc129eaf145]
[ThinkPad-Jan-Vogelsang:50495] [10] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZNK4nest10NestModule16SimulateFunction7executeEP14SLIInterpreter+0x47)[0x7a8a3a877347]
[ThinkPad-Jan-Vogelsang:50496] [11] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libnest.so.3(_ZNK4nest10NestModule16SimulateFunction7executeEP14SLIInterpreter+0x47)[0x7fc129e77347]
[ThinkPad-Jan-Vogelsang:50495] [11] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libsli.so.3(_ZN14SLIInterpreter8execute_Em+0x23a)[0x7a8a3a33fc1a]
[ThinkPad-Jan-Vogelsang:50496] [12] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/pynestkernel.so(+0x468b5)[0x7a8a3b3be8b5]
[ThinkPad-Jan-Vogelsang:50496] [13] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/../../../nest/libsli.so.3(_ZN14SLIInterpreter8execute_Em+0x23a)[0x7fc12993fc1a]
[ThinkPad-Jan-Vogelsang:50495] [12] /home/vogelsang1/vision/nest/master/build/install/lib/python3.12/site-packages/nest/pynestkernel.so(+0x468b5)[0x7fc12aabe8b5]
[ThinkPad-Jan-Vogelsang:50495] [13] python3(PyObject_Vectorcall+0x35)[0x549825]
[ThinkPad-Jan-Vogelsang:50496] [14] python3(PyObject_Vectorcall+0x35)[0x549825]
[ThinkPad-Jan-Vogelsang:50495] [14] python3(_PyEval_EvalFrameDefault+0xa89)[0x5d71d9]
[ThinkPad-Jan-Vogelsang:50496] [15] python3(_PyEval_EvalFrameDefault+0xa89)[0x5d71d9]
[ThinkPad-Jan-Vogelsang:50495] [15] python3(PyEval_EvalCode+0x15b)[0x5d571b]
[ThinkPad-Jan-Vogelsang:50496] [16] python3(PyEval_EvalCode+0x15b)[0x5d571b]
[ThinkPad-Jan-Vogelsang:50495] [16] python3[0x6084c2]
[ThinkPad-Jan-Vogelsang:50496] [17] python3[0x6084c2]
[ThinkPad-Jan-Vogelsang:50495] [17] python3[0x6b44f3]
[ThinkPad-Jan-Vogelsang:50496] [18] python3[0x6b44f3]
[ThinkPad-Jan-Vogelsang:50495] [18] python3(_PyRun_SimpleFileObject+0x1aa)[0x6b425a]
[ThinkPad-Jan-Vogelsang:50496] [19] python3(_PyRun_AnyFileObject+0x4f)[0x6b408f]
[ThinkPad-Jan-Vogelsang:50496] [20] python3(_PyRun_SimpleFileObject+0x1aa)[0x6b425a]
[ThinkPad-Jan-Vogelsang:50495] [19] python3(_PyRun_AnyFileObject+0x4f)[0x6b408f]
[ThinkPad-Jan-Vogelsang:50495] [20] python3(Py_RunMain+0x3b5)[0x6bc0f5]
[ThinkPad-Jan-Vogelsang:50496] [21] python3(Py_RunMain+0x3b5)[0x6bc0f5]
[ThinkPad-Jan-Vogelsang:50495] [21] python3(Py_BytesMain+0x2d)[0x6bbbdd]
[ThinkPad-Jan-Vogelsang:50496] [22] python3(Py_BytesMain+0x2d)[0x6bbbdd]
[ThinkPad-Jan-Vogelsang:50495] [22] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7fc155a2a1ca]
[ThinkPad-Jan-Vogelsang:50495] [23] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7a8a6642a1ca]
[ThinkPad-Jan-Vogelsang:50496] [23] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7a8a6642a28b]
[ThinkPad-Jan-Vogelsang:50496] [24] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7fc155a2a28b]
[ThinkPad-Jan-Vogelsang:50495] [24] python3(_start+0x25)[0x657005]
python3(_start+0x25)[0x657005]
[ThinkPad-Jan-Vogelsang:50495] *** End of error message ***
[ThinkPad-Jan-Vogelsang:50496] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node ThinkPad-Jan-Vogelsang exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

after changing test_cont_proxy_sender.py to:

#!/usr/bin/env python3
import nest

n1 = nest.Create("iaf_cond_exp", params={"I_e": 300.0})
n2 = nest.Create("iaf_cond_exp", params={"I_e": 600.0})

mcoproxy = nest.Create("music_cont_out_proxy")
mcoproxy.port_name = "voltage_out"
mcoproxy.record_from = ["V_m"]

n = n1 + n2

mcoproxy.targets = n

nest.Simulate(20)

JanVogelsang avatar Dec 10 '25 12:12 JanVogelsang

Thanks, I was missing sth, and now I can reproduce the same error. I Wil give it a look.

med-ayssar avatar Dec 10 '25 16:12 med-ayssar

It appears that when MUSIC nodes are connected to other node types, those target nodes must be created after the MUSIC nodes.

The underlying reason is related to the target property of MUSIC nodes and the fact that we are not using the nest.Connect interface explicitly. Because of this, node-creation order matters.

During nest.Simulate, NEST prepares each created node by calling its pre_run_hook.

Here is what happens:

  1. MUSIC nodes: In their pre_run_hook, MUSIC nodes perform several actions:

They connect themselves to all nodes listed in their target property.

They send a test event to those targets.

Receiving this event causes each target node to append a DataLogger to itself.

Later, during the target’s own pre_run_hook, all of its DataLogger instances are initialized.

  1. When node creation order is wrong: If the target nodes are created before the MUSIC node, their pre_run_hook runs before any DataLogger has been added. The sequence becomes:

Target node's pre_run_hook runs → no DataLoggers yet, so nothing is initialized.

MUSIC node's pre_run_hook runs → a DataLogger is created on the target without proper initialization.

This results in an uninitialized DataLogger, and when NEST later tries to work with it, an assertion fails — often causing a segmentation fault.

med-ayssar avatar Dec 11 '25 06:12 med-ayssar

I see, makes sense. Thanks for figuring this out! So the solution would be to add an additional MUSIC-specific pre-run-hook which forms the connections. And only after that the regular pre-run-hook should be called. @heplesser Do you agree or do you have a different solution in mind?

JanVogelsang avatar Dec 11 '25 10:12 JanVogelsang

@med-ayssar Thanks for the analysis! I think for now, we should add clear documentation about the necessary order and then follow this up further. I should had not really thought about the consequence of connection from inside the pre_run_hook(). I think we should take a very good look at why we don't connect music nodes using the normal Connect() method. Any special rules can also create problems with detemining min/max delay which many neurons need to have correct for their init_buffers_() to dimension the ring buffers.

heplesser avatar Dec 11 '25 11:12 heplesser

But what about initializing the data logger directly upon creation?

med-ayssar avatar Dec 11 '25 12:12 med-ayssar

I looked at it a little more. It seems to me that music_cont_out_proxy logically is very similar to multimeter. So we can change it to be used with normal Connect(mcop, neuron) by letting music_cont_out_proxy::send_test_event() register the targets the mcop is connected to. Then the pre_run_hook() only needs to set up the music_index_map based on the registered targets. Since we have never heard about this problem before, I think there are not many users who would be irritated by this change. But I will change the "No breaking change" label.

heplesser avatar Dec 11 '25 13:12 heplesser

I will experiment a bit with a solution along the line sketched above.

heplesser avatar Dec 11 '25 19:12 heplesser