CoreNeuron
CoreNeuron copied to clipboard
FOR_NETCONS not being handled properly.
The mod2c translation follows nocmodl and needs to be updated. CoreNEURON not setting up the information needed for even an updated mod2c translation to to properly deal with the use of the FOR_NETCONS keyword in the NET_RECEIVE block.
There are several way to implement FOR_NETCONS
processing. Eg. pass the already setup information from NEURON or
recompute from scratch in CoreNEURON. I lean toward the latter as it avoids an increase in file transfer size and I believe it is also possible to have better performance by modifying the NetCon weight vector ordering so that the weights are memory adjacent during the loop over NetCon weights that target the same POINT_PROCESS
or ARTIFICIAL_CELL
instance.
Presently, NetCon ordering is memory adjacent on a per list basis for the NetCon sources. I.e. memory adjacent when a PreSyn or InputPresyn places its list of NetCon events on the queue. Presently, weight vector ordering is the same as the NetCon ordering.
// weights in netcons order in groups defined by ```Point_process``` target type.
The mention of "groups defined by ... type" refers to the fact that a weight is possibly a (short) weight array specified by the
number of arguments in the definition of the NET_RECEIVE
block of that mod file.
I think the netcon_in_presyn_order_
is best and should not be tampered with. However, NetCon.deliver
to the target can be assynchronous (different NetCon delay and different target types) and even in the case of constant delay and same target type from a given PreSyn, I doubt the weight ordering impacts performance. Of course, the performance can be compared pre and post FOR_NETCONS
weight ordering.
FOR_NETCONS
weight ordering has the memory usage advantage that instead of a list of NetCon weight indices for each POINT_PROCESS instance that uses the FOR_NETCONS
statement, only a first index for each instance will be necessary. (The count can be derived from the first index of the next instance.) By the way, for POINT_PROCESS
it is probably most often the case that each instance has only a single NETCON targeting it. But for ARTIFICIAL_CELL
the number can be very large, eg 10k.
As FOR_NETCONS
weight ordering is a significant change and can be done even in the absence of any further implementation of the FOR_NETCONS
statement. It seems to make sense to first investigate any potential performance impacts on a variety of models.
I lean toward the latter as it avoids an increase in file transfer size and I believe it is also possible to have better performance by modifying the NetCon weight vector ordering so that the weights are memory adjacent during the loop over NetCon weights that target the same POINT_PROCESS or ARTIFICIAL_CELL instance.
looks reasonable to me as well.
As FOR_NETCONS weight ordering is a significant change and can be done even in the absence of any further implementation of the FOR_NETCONS statement. It seems to make sense to first investigate any potential performance impacts on a variety of models.
As I don't have whole picture, may be we should discuss this over call and see how generated code will look like.
Vectors of NetCon information are sent from nrnbbcore_write.cpp on a per thread basis and the order is the order seen for that
thread from the HOC interpreter NetCon Object list which is the order of NetCon creation. The size of each information vector is n_netcon
and the information vectors are target type, target index, srcgid, and delay. The exceptional vector is weight which conceptually has the same order but as different target types have different numbers of NET_RECEIVE
args, each weight is a subarray of of size according to number of args and the entire vector of weights has a total size of nweight.
At the nrn_setup.cpp receiver side, nt.netcons = new NetCon[nt.n_netcon]
but as these are empty, at this moment ordering is undefined. It turns out that my previous comment is in error with regard to netcon_in_presyn_order_
. This latter is a vector of NetCon* and the actual NetCon information (delay, weight_index, etc.) is in the original nt.netcons order as received from NEURON. So a PreSyn or InputPreSyn iterates
for (int i = nc_cnt_ - 1; i >= 0; --i) {
NetCon* d = netcon_in_presyn_order_[nc_index_ + i];
with the NetCon* d
pointing into the nt.netcons
vector of NetCon. Note that the weights do not live in a NetCon but rather the NetCon has an integer u.weight_index
which points into nt.weights
. Note that it may be that netcon_in_presyn_order_
space could be saved (or at least not be permanent or at least be converted to integer) by permuting the meaning of nt.netcon
.
Third time is the charm for understanding. I was mistaken that an int*
array into nt.netcons
could replace NetCon* netcon_in_presyn_order_
The reason is that the PreSyn
need a list of NetCon*
that are in various threads. (The NetCon is in the same thread as the target). The original reason for netcon_in_presyn_order
was to have a single allocation for saving space rather than an allocation of a NetCon*
array for every Presyn
and InputPresyn
. One could still perhaps do better in space saving by having a small list of (index, cnt, thread_id)
in the Presyn
etc. But, nowadays on cpu it is not uncommon to have in the neighborhood of 50 threads.