export_report is very slow
Hi there — thank you all for maintaining this suite of great tools! I am trying to write a script that handles pre-processing / sorting / post-processing all at once.
Describe the issue
I've noticed that the export_report function is very slow — almost 30 seconds per unit. If I have a few hundred units, that's an extra hour, which is in some cases more than the entire pre-processing + sorting!
My recording is 1 hour long, and the units seem to have relatively low firing rates (1 - 20) so I don't think that's the issue. This is on version 0.100.4, so I'm willing to be told that these issues have been solved, but just poking around the code a bit, it doesn't look like it.
Reproducing
I can reproduce this by simply re-loading the waveform extractor from a folder like we = si.load_waveforms(waveform_dir), and then running export_report(we, output_folder=qc_dir, **job_kwargs). Everything is fast except generating the per-unit plots at the end.
Ideas
I dug around a bit, and I have two clues:
-
the slowest part seems to be
sw.plot_unit_waveforms, which takes about ~20 seconds on its own. I can't figure out why it's slow, as loading in the templates / waveforms (i.e.we.get_waveforms(60)) is very fast. I guess matplotlib is slow to plot hundreds of lines? Speaking for myself, I find the smear of all the raw waveforms totally uninformative:and I would rather have a mean +/- std, which I assume would also be much faster.
-
for the amplitudes part of the plot: running something like
sw.plot_amplitudes(we, unit_ids=[60])directly, takes roughly the same amount of time (~13 seconds) as just loading the data with this line: https://github.com/SpikeInterface/spikeinterface/blob/7d0e1da655beddb414c21a3a0b2d65b9ab115f1d/src/spikeinterface/widgets/amplitudes.py#L56 I see that the point of loading all the data is to allow the widget to plot arbitrarily many units at once, but in the case where we plot only one unit per instance of the widget, many times, this is a huge time suck. I admit I don't really see why loading the amplitude data takes so long, if I just runnp.load("/path/to/amplitude_segment_0.npy")it's very fast (< 1 sec).
So in summary, I would suggest: -- have an option to plot mean/std instead of raw waveforms, or just not show waveforms at all, in the export_report function. -- figure out a way to make loading unit amplitudes as fast as loading the waveform data.
Hey @jonahpearl,
I just profiled sw.plot_unit_summary the last step of export_report from main and it took:
%timeit sw.plot_unit_summary(analyzer, unit_id=0)
3.77 s ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Then I did the same with export_report
%timeit sexp.export_report(test_analyzer2, output_folder='./test', remove_if_exists=True)
2.47 s ± 9.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This had five units. When I tested it with one unit only
%timeit sexp.export_report(test_analzyer, output_folder='./test', remove_if_exists=True)
683 ms ± 9.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
so it seems to be roughly scaling with units. 100 units * 0.5 sec/unit would be about 1 min for 100 units. For reference this was based on a 75 minutes recording.
Would you be willing to update to main and seeing if you still have the slowdown in performance.
Also important to note that for these analyzer tests I put my analyzer in-memory format rather than on disk. Which is a feature with the SortingAnalyzer that may provide some speed boost.
Hi, thanks for pointing this. Yes adding more options to remove waveforms or so to speedup the plot_unit_summary is a good idea. Go in this direction with a PR if you have time for this.
A good idea also would to use PoolWorker to make units figure in parralel but this is more work...
Yes, I also found this issue on the version of 0.100.8. Exporting analysis data takes progressively longer as the number of units increases. For example, exporting data for 19 units takes around 2 minutes, while exporting data for 63 units takes about 30 minutes, and exporting data for 107 units takes more than a day.
To avoid avoid spending a lot of time on this when dealing with a large number of units. I cancel the export process for the 'units' folder when the number of units is too large. By revising in "user\AppData\Local\miniconda3\envs\si_env\Lib\site-packages\spikeinterface\exporters\report.py"
Could you please provide a way to allow me to select specific parts to export as picture of unit? I need some part of picture as below: