Inefficiency of creating large numbers of SpikeTrain objects

Open antolikjan opened this issue 6 months ago • 2 comments

Dear Neo developers,

We came across a use-case which revealed a major inefficiency in Neo (or rather the quantities package Neo uses, as I will explain).

In our simulations were are recording from hundreds of thousands neurons, and in some cases in response to thousands of stimuli, creating very large number of SpikeTrain in the process. We found that loading this data, despite it being in the order of still manageble 10s of Gigabytes can take 10s of hours. We tracked the reason for this extremely slow performance to the fact that there is a very significant constant overhead associated with creation of each SpikeTrain object, that unfortunately doesn't happen in Neo itself but can be tracked down to the quantities package.

Initially I hoped this could be rectified by using the SpikeTrainList object, but ultimately that object uses individual SpikeTrains anyways.

My question is (a) do you have any ideas how we could improve the efficiency in our use case, (b) is there any plan to rework SpikeTrainList to use more efficient representation, given that it works with list of spiketrain with common temporal reference?

Bellow I am attaching a quick example code demonstrating that unpickling a SpikeTrainList can be as much as 100 faster if one saves it with only the multiplexed representation as opposed to the equivalent SpikeTrainList saved with the SpikeTrain representation. Note that the multiplxed representation does not solve anything for us, because the moment we want to use such loaded SpikeTrainList to do any operations it will trigger the conversion into the list of SpikeTrains representation nullifying the time saved time during loading.

example code

import cProfile import pstats import neo.core.spiketrain import importlib from neo.core.spiketrain import SpikeTrain from neo.core.spiketrainlist import SpikeTrainList, is_spiketrain_or_proxy from pstats import SortKey from quantities import ms import pickle import numpy

s = [SpikeTrain(list(range(0,200))*ms,t_stop=10000) for i in range(100000)] sl = SpikeTrainList(s)

with open("dump_standard.pickle", 'wb') as f: pickle.dump(sl,f) f.close()

a,b = sl.multiplexed

with open("dump_modified.pickle", 'wb') as f: pickle.dump(SpikeTrainList.from_spike_time_array(b,a, all_channel_ids=list(range(0,len(sl))), units='ms', t_start=0 * ms, t_stop=10000.0 * ms) ,f) f.close()

def aaa(): with open("dump_standard.pickle", 'rb') as f: sps = pickle.load(f) print(numpy.mean(sps)) f.close()

def bbb(): with open("dump_modified.pickle", 'rb') as f: sps = pickle.load(f) print(numpy.mean(sps)) f.close()

cProfile.run('bbb()', 'restats_after_modified') p = pstats.Stats('restats_after_modified') p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

cProfile.run('aaa()', 'restats_after_standard') p = pstats.Stats('restats_after_standard') p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

Jun 13 '25 08:06 antolikjan