spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Cannot set chunk_size during spykingcircus2 peak finding

Open bharvey-neuro opened this issue 1 year ago • 2 comments

Hello,

I am new to using spikeinterface, and am trying to sort spikes using spykingcircus2 for multiple-stereotrode data. I'm currently on version 100.8.

I have imported, band-passed, and common-referenced my Intan RHS file, and correctly saved it out prior to running spykingcircus.

To correct for using stereotrodes instead of a typical array- or shank-based probe, I've run the following block (region/coords list shortened for brevity):

regions=np.array(('ParScrew','PP','PP','vDG','vDG','Ent','Ent'))

coords_dict={ 'ParScrew': (2800,1500,0), 'PP': (-2000,-2530,-1400),'vDG': (-2500,-3520,-2500)}

locs=[]

for region in regions:
    locs.append(coords_dict[region])
    
recording_preprocessed.set_property("brain region",regions) 
recording_preprocessed.set_channel_locations(locs)

This should, I believe, allow me to run the sorter using 'brain region' as a property to only run each stereotrode pair against one another.

The series of commands I'm currently using to then execute spykingcircus is:

job_kwargs = dict(n_jobs=15,total_memory='60G', progress_bar=True)
si.set_global_job_kwargs(**global_job_kwargs)
sorting = ss.run_sorter_by_property(sorter_name=sorter, recording=recording_preprocessed, remove_existing_folder=True, 
                                    grouping_property='brain region',working_folder='sort_by_group', job_kwargs=job_kwargs, 
                                    selection = {"n_peaks_per_channel": 10000,
                                    "min_n_peaks": 20000}, verbose=True, 
                                    detection={'detect_threshold':5}, apply_preprocessing=False)
                                    
print(sorting)
# engine="joblib",engine_kwargs={"n_jobs": 12},
sorting_SPC=sorting.save(folder=f'./{sorter}_sorting_output',overwrite=True)

So, all that said, my current issue is that when spykingcircus2 is extracting waveforms, the chunk_size of joblib is working correctly. However, when the "find spikes" operation is run, my chunk_size is limited to 3000.


extract waveforms shared_memory multi buffer with n_jobs = 15 and chunk_size = 500000000
extract waveforms shared_memory multi buffer: 100%
 1/1 [00:00<00:00,  5.78it/s]
find spikes (circus-omp-svd) with n_jobs = 15 and chunk_size = 3000

Can anyone advice on how to change this chunk size to speed up this process? I'm trying to analyze chronic recordings so ease and speed of processing is important to me!

Thanks!

bharvey-neuro avatar Jul 03 '24 22:07 bharvey-neuro

HI,

@yger can confirm, but I think that the template matching step has hardcoded chunk sizes

alejoe91 avatar Jul 04 '24 07:07 alejoe91

Yes, indeed, in spyking circus 2, there used to be a hardcoded limit for the find_peaks procedure (if not using wobble). Are you using the latest version from main ? Because not sure this is still in place. However, the reasons was that smaller chunks are faster for the template-matching step algorithm, despite counterintuitive. But this should not harm the results

yger avatar Jul 04 '24 08:07 yger

It seems like this one has been answered and there was no follow-up. Feel free to ping us with additional questions if they come up.

zm711 avatar Feb 21 '25 16:02 zm711