nest-simulator icon indicating copy to clipboard operation
nest-simulator copied to clipboard

Segfault when using MUSIC with more than 3 threads

Open JanVogelsang opened this issue 9 months ago • 5 comments

The following code snipped produces a segmentation fault, as it tries to access a non-existent Connector in line 218 of target_table_devices.h (target_from_devices_[ tid ][ ldid ][ syn_id ]->get_synapse_status( tid, lcid, dict );):

nest.local_num_threads = 3

nrns = nest.Create("parrot_neuron", 2)

music_in = nest.Create("music_event_in_proxy", 1, {"port_name": "in_spikes"})
music_out = nest.Create("music_event_out_proxy", 1, {"port_name": "out_spikes"})

nest.Connect(music_in, nrns)
nest.Connect(nrns, music_out)

nest.GetConnections().get()

JanVogelsang avatar Mar 26 '25 11:03 JanVogelsang

@heplesser (CC @JanVogelsang ), I tried to debug it, and I had some interesting issues. Somehow I think I forgot a bit about the logic behind devices and proxies in the context of MUSIC (I know that neurons on thread i as none-device models are represented as proxies on the other threads j =! i), and devices are replicated over all threads), however; MUSIC devices (according to add_music func) are only added to thread t=0, no proxies!

While debugging I found those issues:

  1. if we first create music_event_in_proxy MUSIC device-alike models, then we get a different segfault.
  2. if we set local_num_threads = 2, the result of getConnections is wrong, compared to local_num_threads = 1 ( 1X edge is missing and 1X is duplicated, independent of [1])

Question: is a MUSIC node, considered a device, and should it be added to the logic of target_to_devices_ and target_from_devices_?

void
nest::ConnectionManager::connect( const size_t snode_id,
  Node* target,
  size_t target_thread,
  const synindex syn_id,
  const DictionaryDatum& params,
  const double delay,
  const double weight )
{
  kernel().model_manager.assert_valid_syn_id( syn_id, kernel().vp_manager.get_thread_id() );

  Node* source = kernel().node_manager.get_node_or_proxy( snode_id, target_thread );

  ConnectionType connection_type = connection_required( source, target, target_thread );

  switch ( connection_type )
  {
  case CONNECT:
    connect_( *source, *target, snode_id, target_thread, syn_id, params, delay, weight );
    break;
  case CONNECT_FROM_DEVICE:
    connect_from_device_( *source, *target, target_thread, syn_id, params, delay, weight );
    break;
  case CONNECT_TO_DEVICE:
    connect_to_device_( *source, *target, snode_id, target_thread, syn_id, params, delay, weight );
    break;
  case NO_CONNECTION:
    return;
  }
}
 if ( target->has_proxies() )
  {
    if ( source->has_proxies() )
    {
      return CONNECT;
    }
    else
    {
      return CONNECT_FROM_DEVICE;
    }
  }

Assuming we are on thread t_i != 0, and our source node is the music_event_in_proxy, then the function NodeManager::get_node_or_proxy( size_t node_id, size_t t ) will return a proxy, thus, the source is a proxy, and we return CONNECT, not CONNECT_FROM_DEVICE, but MUISC is neither a device nor a neuron, and the kernel is somehow using it inconsistently.

Therefore, I would appreciate a short explanation of how MUSIC nodes are expected to be used in the context of the devices and proxies to continue debugging.

Have a nice Monday

med-ayssar avatar Mar 30 '25 20:03 med-ayssar

Ping @JanVogelsang && @heplesser

med-ayssar avatar Apr 19 '25 15:04 med-ayssar

Issue automatically marked stale!

github-actions[bot] avatar Jun 19 '25 08:06 github-actions[bot]

@med-ayssar Sorry for the very late reply. MUSIC nodes do not have proxies (has_proxies() returns false). But while usual nodes without proxies have on replica per thread, MUSIC nodes exist only once per MPI process. The indicate that by defining one_node_per_process() to return true. The single replica on each MPI process must supply all threads on the process with the MUSIC data (music_..._in_...) or collect from all threads ( music_..._out__...) on the local MPI process. We may have overlooked a case distinction in ConnectionManager::connect().

heplesser avatar Jul 10 '25 20:07 heplesser

Issue automatically marked stale!

github-actions[bot] avatar Nov 10 '25 08:11 github-actions[bot]