spikeinterface
spikeinterface copied to clipboard
Avoid intermediate recording.dat file for SpikeGLXRecordingExtractor
Hi all,
We are sorting enormous datasets for which the extra recording.dat binary copy becomes a serious issue, both in terms of speed and disk usage.
I see here that this step can be bypassed under certain conditions.
The recording.dat file is generated anyways when sorting a (single, non concatenated) SpikeGLXRecordingExtractor. Is this really needed?? My understanding was that it is common practice to use kilosort directly on the output of SGLX data preprocessed with CatGT (eg in Jennifer Colonel's pipeline )
Also, since we're at it. One of the conditions to bypass recording.dat is that there is no file concatenation. Does anyone have an idea whether there's a fundamental reason for kilosort doesn't accept multiple bin files to concatenate on the fly during preprocessing?
To give a bit more context: currently we can get up to 4 copies of the same data at the same time:
- raw files
- preprocessing with catgt of each contiguous segment of data
- intermediate copy recording.dat that does pretty much nothing besides concatenation
- temp_wh.dat with preprocessed/drift corrected data This wasn't so much of an issue in term of disk space because # 2 and # 3 are deleted after sorting. But now that we'd like to sort longer recordings having 4 copies of the data at the same time is becoming too much.
Thanks ! Tom
Hi Tom, many sorters need a binary files as input. So for many input format (and also when a recording is preprocessed by spikeinetrface), a binary copy is done before running the sorter. Recently we added the concetpt of "binary_compatible" for recording extractor to avoid making copy when the underlying file is in fact a binary.
See: https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/core/baserecording.py#L367 https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/core/baserecording.py#L387
This have been done for BaseRecording https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/core/binaryrecordingextractor.py#L119 but because we are a bit lazy we haven't propagate this to spikeglx or openephys which are also true binary. I think this is easy to do. I can make a PR soon. You can also make a try.
Note that the CatGT also imply somehow a binary copy (+computing of course) Also note that this step can be done in spikeinterface : https://spikeinterface.github.io/blog/spikeinterface-destripe.
So you could choose to use the spikeinterface implementation for the preprocessing.
then make a
rec_saved = recording_prprocessed.save(folder='/path/to/copy', format='binary')
this make also copy (+computing).
Please have a look to the advanced tutorial "lazy processing explain"