CoreNeuron FOR_NETCONS not being handled properly.

FOR_NETCONS not being handled properly.

Open nrnhines opened this issue 3 years ago • 4 comments

The mod2c translation follows nocmodl and needs to be updated. CoreNEURON not setting up the information needed for even an updated mod2c translation to to properly deal with the use of the FOR_NETCONS keyword in the NET_RECEIVE block.

Jul 25 '20 11:07 nrnhines

There are several way to implement FOR_NETCONS processing. Eg. pass the already setup information from NEURON or recompute from scratch in CoreNEURON. I lean toward the latter as it avoids an increase in file transfer size and I believe it is also possible to have better performance by modifying the NetCon weight vector ordering so that the weights are memory adjacent during the loop over NetCon weights that target the same POINT_PROCESS or ARTIFICIAL_CELL instance.

Presently, NetCon ordering is memory adjacent on a per list basis for the NetCon sources. I.e. memory adjacent when a PreSyn or InputPresyn places its list of NetCon events on the queue. Presently, weight vector ordering is the same as the NetCon ordering.

// weights in netcons order in groups defined by ```Point_process``` target type.

The mention of "groups defined by ... type" refers to the fact that a weight is possibly a (short) weight array specified by the number of arguments in the definition of the NET_RECEIVE block of that mod file. I think the netcon_in_presyn_order_ is best and should not be tampered with. However, NetCon.deliver to the target can be assynchronous (different NetCon delay and different target types) and even in the case of constant delay and same target type from a given PreSyn, I doubt the weight ordering impacts performance. Of course, the performance can be compared pre and post FOR_NETCONS weight ordering.

FOR_NETCONS weight ordering has the memory usage advantage that instead of a list of NetCon weight indices for each POINT_PROCESS instance that uses the FOR_NETCONS statement, only a first index for each instance will be necessary. (The count can be derived from the first index of the next instance.) By the way, for POINT_PROCESS it is probably most often the case that each instance has only a single NETCON targeting it. But for ARTIFICIAL_CELL the number can be very large, eg 10k.

As FOR_NETCONS weight ordering is a significant change and can be done even in the absence of any further implementation of the FOR_NETCONS statement. It seems to make sense to first investigate any potential performance impacts on a variety of models.

Jul 25 '20 12:07 nrnhines

I lean toward the latter as it avoids an increase in file transfer size and I believe it is also possible to have better performance by modifying the NetCon weight vector ordering so that the weights are memory adjacent during the loop over NetCon weights that target the same POINT_PROCESS or ARTIFICIAL_CELL instance.

looks reasonable to me as well.

As FOR_NETCONS weight ordering is a significant change and can be done even in the absence of any further implementation of the FOR_NETCONS statement. It seems to make sense to first investigate any potential performance impacts on a variety of models.

As I don't have whole picture, may be we should discuss this over call and see how generated code will look like.

Jul 25 '20 20:07 pramodk

Vectors of NetCon information are sent from nrnbbcore_write.cpp on a per thread basis and the order is the order seen for that thread from the HOC interpreter NetCon Object list which is the order of NetCon creation. The size of each information vector is n_netcon and the information vectors are target type, target index, srcgid, and delay. The exceptional vector is weight which conceptually has the same order but as different target types have different numbers of NET_RECEIVE args, each weight is a subarray of of size according to number of args and the entire vector of weights has a total size of nweight.

At the nrn_setup.cpp receiver side, nt.netcons = new NetCon[nt.n_netcon] but as these are empty, at this moment ordering is undefined. It turns out that my previous comment is in error with regard to netcon_in_presyn_order_. This latter is a vector of NetCon* and the actual NetCon information (delay, weight_index, etc.) is in the original nt.netcons order as received from NEURON. So a PreSyn or InputPreSyn iterates

    for (int i = nc_cnt_ - 1; i >= 0; --i) {
        NetCon* d = netcon_in_presyn_order_[nc_index_ + i];

with the NetCon* d pointing into the nt.netcons vector of NetCon. Note that the weights do not live in a NetCon but rather the NetCon has an integer u.weight_index which points into nt.weights. Note that it may be that netcon_in_presyn_order_ space could be saved (or at least not be permanent or at least be converted to integer) by permuting the meaning of nt.netcon.

Jul 25 '20 22:07 nrnhines

Third time is the charm for understanding. I was mistaken that an int* array into nt.netcons could replace NetCon* netcon_in_presyn_order_ The reason is that the PreSyn need a list of NetCon* that are in various threads. (The NetCon is in the same thread as the target). The original reason for netcon_in_presyn_order was to have a single allocation for saving space rather than an allocation of a NetCon* array for every Presyn and InputPresyn. One could still perhaps do better in space saving by having a small list of (index, cnt, thread_id) in the Presyn etc. But, nowadays on cpu it is not uncommon to have in the neighborhood of 50 threads.

Jul 26 '20 12:07 nrnhines

CoreNeuron CoreNeuron copied to clipboard

FOR_NETCONS not being handled properly.

CoreNeuron
CoreNeuron copied to clipboard