python-neo NestIO for large datasets

Hi!

I was recently working together with @jasperalbers, who is using the NestIO to load simulated data from large scale multi-area models.

His problem was that he has hundreds of thousands of neurons (around 1GB in total), which when saved as neo.SpikeTrain objects it would take hours to load from disk (on all HDF5, pickle and nix). Incredible amounts of time were spent building the neo objects themselves. We found a rather unorthodox workaround to this problem, by saving the spikes directly as lists of lists, which brought down the load time to a few seconds.

We wrote a couple of extra functions to the NestIO to load the spike times as llists of lists, alongside the neuron IDs. This is obviously not ideal from a metadata perspective, but we thought it might still be a useful function to have, especially for agile analysis of large simulated data.

Let us know if this functions are any good, if you think they are worth including I can also write some tests.

Best, Aitor

Jun 18 '21 11:06 morales-gregorio

Ha @morales-gregorio Thanks for sharing your code. I think this problem might be improved quite a lot when https://github.com/NeuralEnsemble/python-neo/pull/1000 is being merged as this allows to generate lists of spiketrains based on a gdf based data organization (one array of timestamps & one array of unit ids) and only do the conversion to spiketrains when required. We should revisit your code once https://github.com/NeuralEnsemble/python-neo/pull/1000 is merged.

Jun 18 '21 11:06 JuliaSprenger

Indeed! #1000 looks like the solution to this problem! Looking forward to it, happy to contribute to merging this with the neo.SpikeTrainList once it is ready

Jun 18 '21 11:06 morales-gregorio

Hi! I see that #1000 was merged already, any updates on implementing it within the NestIO?

Jan 28 '22 10:01 morales-gregorio

python-neo python-neo copied to clipboard

NestIO for large datasets

python-neo
python-neo copied to clipboard