spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

mountainsort runs far slower on spikeinterface

Open rtraghavan opened this issue 3 years ago • 5 comments

An interesting observation I've made while sorting a relatively small recording is an enormous processing time difference between using mountainsort installed independently (this version), vs running it through spike interface having installed mountainsort4 via pip. Some background about the recording.

32 channel laminar arrays recording. Signals gathered at 30KHz. Analysis of 1 hour worth of recording.

Running mountainsort by itself on the data, bandpass filtering, whitening, and running the main sort algorithm takes 418 seconds. When I use mountainsort through spikeinterface, it takes 3078 seconds (~7.3x the time) to do the same steps.

Both sorters are reading from the same .mda file, the settings are all the same (threshold, detect interval, adjacency radius, number of workers).

@magland do you have any idea why this would be?

rtraghavan avatar Feb 24 '22 15:02 rtraghavan

Strange. The wrapper in spikeinterface is really a thin layer on top of moutainsort. See https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/sorters/mountainsort4/mountainsort4.py#L90

Is it due to filetring and whitening done in spikeinterface here : https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/sorters/mountainsort4/mountainsort4.py#L100

@rtraghavan : could you try to put some

t0 = time.perf_counter()
...
t1 = time.perf_counter()
print(t1 - t0)

in the _run_from_folder of this wrapper to check where the time is lost ?

samuelgarcia avatar Feb 24 '22 15:02 samuelgarcia

@samuelgarcia I think @rtraghavan is referring to differences between mountainsort3 and mountainsort4. Maybe @magland comment on it :)

alejoe91 avatar Feb 24 '22 15:02 alejoe91

It's true that these versions have different implementations. I am surprised that mountainsort4.py is significantly slower. During the iterations, does it say how many channels are in each neighborhood? @rtraghavan

magland avatar Feb 24 '22 17:02 magland

Apologies for the delay. On later inspection it seems both sorters were not using the maximum available number of workers. Moreover I realize from the earlier comments that total run time in spike interface should be compared to the sum of filtering, whitening, and sorting in mountainsort-js. Setting them equal to one another and rerunning the code I get the following.

mountainsort-js: 363 sec spikeinterface mountainsort4: 1051 sec

Still a ~3 fold difference between the two. Is this closer to what you'd expect @magland

@magland During iterations of mountainsort-js and mountainsort4, they report 3 channels within each neighborhood for both recordings.

I should be clear @alejoe91, this is not mountainsort3 vs mountainsort4. The algorithm for mountainsort-js is ml_ms4alg.

rtraghavan avatar Mar 01 '22 23:03 rtraghavan

Strange... if they are both using ml_ms4alg (mountainsort4 repo alg is no different from ml_ms4alg alg), then I would not expect a difference in timing.

magland avatar Mar 02 '22 14:03 magland

maybe this discussion should be ported to the mountainsort4 repo

alejoe91 avatar Oct 13 '22 11:10 alejoe91