Instability with SC2 in dense recordings
Sorry to bring this up again, as I'm pretty sure this is only an issue on recordings with a few thousands of units.
I did not keep up with new commits for a while, and when trying to run sorting again on the latest version it systematically crashes.
The weird thing is that the point at which it crashes is not always reproducible. Starting the script from a local session, over SSH or in a tmux server also seems to have an influence.
The most common crash points are :
- Creating the
maskarray incompute_similarity_with_templates_array(here) - During the main loop of
_compute_similarity_matrix_numba(here)
Looking at the resource usage, the pattern is always the same : there are several Python processes, the number of which is equal to n_jobs, with an equal amount of memory usage, and another Python process with a lot more memory (20-30GB). Then, that amount rapidly climbs to to ~50-60GB, then the session crashes. I'm not entirely sure this is an OOM error, because this amount is still small enough to fit in memory. So either it rises so fast that the session crashes before the display has time to update, or something else is happening.
As a side note, _get_optimal_n_jobs works properly and n_jobs is scaled to memory appropriately. I've noticed that it is only used when templates_from_svd is set to False but I'm not quite sure of the difference.
Update : I also get a similar behavior with TdC2 at this step
Creating the mask array in compute_similarity_with_templates_array (here)
So this doesn't seem to be sorter-specific. Oddly, it only seems to happen if I use more than one core and/or start the script from a tmux session. If I start it directly from the shell with n_jobs set to 1 the sorting goes on, albeit slowly.
Thanks for your feedback, I'll have a look quickly since indeed we would like everything to properly scale up with large number of units. Just to generate dummy data, remind me: you have 4096 channels, and something like 4000 units, am I right?
That's about it, 3000 to 5000 units on average for in vitro recordings. I can share recordings if needed, but at around 8 GB/minute they're quite heavy. I have 1, 5 and 10 minutes recordings. I also have the probe geometry exported as YAML and JSON if needed.
I don't think you need to share anything. Based on your comments, there seems to be an issue in compute_similarity_with_templates_array because of internal data structure that should be too large. I'll dig more in depth
If we suppose that the problem is in the creation of the mask in compute_similarity_with_templates_array , you can try to see if #4152 is solving the issue. This should decrease the memory footprint of the whole function
I'm getting the following error :
Error running spykingcircus2
Traceback (most recent call last):
File "/home/user/Documents/ephy/sc_si.py", line 135, in <module>
sorting = ss.run_sorter(
^^^^^^^^^^^^^^
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sorters/runsorter.py", line 210, in run_sorter
return run_sorter_local(**common_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sorters/runsorter.py", line 302, in run_sorter_local
SorterClass.run_from_folder(folder, raise_error, verbose)
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sorters/basesorter.py", line 310, in run_from_folder
raise SpikeSortingError(
spikeinterface.sorters.utils.misc.SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sorters/basesorter.py", line 270, in run_from_folder
SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sorters/internal/spyking_circus2.py", line 325, in _run_from_folder
outputs = find_cluster_from_peaks(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sortingcomponents/clustering/main.py", line 48, in find_cluster_from_peaks
outputs = method_class.main_function(recording, peaks, params, job_kwargs=job_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sortingcomponents/clustering/circus.py", line 220, in main_function
peak_labels, merge_template_array, merge_sparsity_mask, new_unit_ids = merge_peak_labels_from_templates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sortingcomponents/clustering/merge.py", line 558, in merge_peak_labels_from_templates
_apply_pair_mask_on_labels_and_recompute_templates(
File "/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/sortingcomponents/clustering/merge.py", line 614, in _apply_pair_mask_on_labels_and_recompute_templates
merge_sparsity_mask = merge_sparsity_mask[keep_template, :]
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
IndexError: boolean index did not match indexed array along dimension 0; dimension is 4095 but corresponding boolean dimension is 4783
I guess 4095 matches the number of channels (the first one is usually removed as it is the reference channel) but I'm not sure where 4783 comes from.
Thanks I'll keep digging then!
I guess the error comes from the fact that you should have templates_from_svd=True as a default argument in run_sorter(). The bug is thus slightly different, and I'll fix it also. But meanwhile, note that templates_from_svd will become the default in a major refactoring that will arrive soon for SC2, alongside with the paper
The behaviour templates_from_svd=False should be fixed now in this branch also. But as said, I think you should turn this option to True both for speed and memory considerations
Tested it and it does fix the problem, thanks !
I think you should turn this option to True both for speed and memory considerations
Good to know, I'll stick with it going forward.
There's another, similar problem happening down the line.
I've narrowed it down to the initialization of CircusOMPSVDPeeler in _prepare_templates, specifically when calculating unit_overlaps here.
I can open a separate issue if needed.
I'll make a separate PR, thanks for spotting all these issues triggered when scaling up the number of templates
The memory footprint for overlaps is also fixed in #4157
Working well, thanks a lot !
There's still an issue around final_cleaning_circus but it can be avoided by enforcing n_jobs=1 just before the call. Sorting is slow towards the end but finishes properly.
I also noticed the same issue with TdC2 at the same step, I'll look more into it. The raw sorting can be saved to disk but the curation step crashes. As I understand there are major reworks going on for both sorters so I'll keep that in mind.
Damned, ok, I'll also fix this last one
I haven't had time to pinpoint the issue exactly, but I think it might be the sparsity estimation when the analyzer is created.
If I create an analyzer from scratch on this kind of high-density sorting with n_jobs to the max, I get a crash.
If I do the same with n_jobs=1 the analyzer is created just fine, and I can switch back to parallel processing to compute the extensions.
If I reload the analyzer created by final_cleaning_circus it also works, but maybe sparsity is also reloaded in that case ?
Everything should be fixed now in master, we worked hard to optimize the code for large recordings
Apologies for the lack of feedback, I'm in the final few weeks of writing my thesis and I didn't have too much time to spare. I actually tried again with a fresh install of SI earlier this week as I noticed quite a few commits were made towards this, and both SC2 and TdC2 now run smoothly, at least on a few cores. I haven't tried with all cores yet, but everything works just fine otherwise. Thanks again for all your work !