spikeinterface
spikeinterface copied to clipboard
`export_to_phy` with large dataset
Hi,
I'm trying to use the export_to_phy function on a rather large dataset (30 minutes neuropixels recording with 300+ units).
When I try to use the function "normally", I get an error because I don't have enough RAM:
wvf_extractor = si.extract_waveforms(recording, sorting, "/mnt/raid0/tmp/spk_int", ms_before=3.0, ms_after=4.0, max_spikes_per_unit=1000, load_if_exists=False, return_scaled=False, chunk_duration='2s', n_jobs=12)
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average")
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 183, in export_to_phy
amplitudes = compute_spike_amplitudes(waveform_extractor, peak_sign=peak_sign, outputs='concatenated',
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 179, in compute_spike_amplitudes
sac.compute_amplitudes(**job_kwargs)
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 124, in compute_amplitudes
out = processor.run()
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/job_tools.py", line 253, in run
res = self.func(segment_index, frame_start, frame_stop, worker_ctx)
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 267, in _spike_amplitudes_chunk
traces = recording.get_traces(start_frame=first, end_frame=last+1, segment_index=segment_index,
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/baserecording.py", line 115, in get_traces
traces = traces.astype('float32') * gains + offsets
numpy.core._exceptions.MemoryError: Unable to allocate 82.4 GiB for an array with shape (57599828, 384) and data type float32
I thus decided to use chunking to avoid this problem
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)
And I got the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 180, in export_to_phy
sac = waveform_extractor.load_extension('spike_amplitudes')
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/waveform_extractor.py", line 204, in load_extension
return ext_class.load_from_folder(self.folder)
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/waveform_extractor.py", line 877, in load_from_folder
ext._specific_load_from_folder()
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 69, in _specific_load_from_folder
amps_seg = np.load(file_amps)
File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 407, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/raid0/tmp/spk_int/spike_amplitudes/amplitude_segment_0.npy'
How should I go about exporting to phy with a large dataset?
Thank you, Aurélien W.
The error seeams to be in compute spike amplitudes.
Could you try tpo compute spike amplitudes before export_to_phy ? Lets see why this fails.
My guess is that the chunk_size is not propagated properlly to this sub function.
amplitudes = st.compute_spike_amplitudes(wvf_extractor, load_if_exists=False, chunk_duration='2s', n_jobs=12)
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)
Works fine ...
except that I cannot open the folder with phy template-gui params.py:
16:16:09.568 [W] model:1153 File None does not exist.
16:16:09.844 [W] model:545 Unreferenced clusters found in templates (generally not a problem)
16:16:10.026 [W] model:567 Unreferenced clusters found in spike_clusters (generally not a problem)
16:16:10.033 [E] __init__:62 An error has occurred (AssertionError):
Traceback (most recent call last):
File "/users/nsr/wyngaard/.conda/envs/phy/bin/phy", line 8, in <module>
sys.exit(phycli())
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phy/apps/__init__.py", line 159, in cli_template_gui
template_gui(params_path, **kwargs)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phy/apps/template/gui.py", line 198, in template_gui
controller = TemplateController(model=load_model(params_path), dir_path=dir_path, **kwargs)
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 1178, in load_model
return TemplateModel(**get_template_params(params_path))
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 297, in __init__
self._load_data()
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 356, in _load_data
self.sparse_templates = self._load_templates()
File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 636, in _load_templates
assert data.dtype in (np.float32, np.float64)
AssertionError
spike_templates.npy has a dtype of int64
This is strange because by default compute_spike_amplitudes()have return_scaled=True.
Could you check the dtype in the sub folder "spike_amplitudes".
Also could you check the dtype with rec.get_traces(..., return_scaled=True)
There must a bug.
amplitude_segment_0.npy has a dtype of float32.
I ran compute_spike_amplitudes(return_scaled=True) but extract_waveforms(return_scaled=False)
Does it come from here?
The spikes amplitudes load all traces again so it is independant.
This is strange that amplitude_segment_0.npy has a dtype of float32 and the one copied in phy is int64. Could you debug line186 in to_phy.py and print the dtype ?
This is strange that amplitude_segment_0.npy has a dtype of float32 and the one copied in phy is int64.
No the one in phy is float32.
Is it spike_templates.npy that is of type int64.
ah! Sorry I was thinking it was the spike amplitudes the problem!!!
We need a fix in export_to_phy to convert to float even if you we use resturn_scaled=False.
Waiting the fix you should use extract_waveforms(return_scaled=True) it should fix your bug no ?
Well I get another problem now...
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)
Warning: empty units have been removed when being exported to Phy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 161, in export_to_phy
template_similarity = compute_template_similarity(waveform_extractor, method='cosine_similarity')
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/template_similarity.py", line 37, in compute_template_similarity
similarity = sklearn.metrics.pairwise.cosine_similarity(templates_flat, templates_other_flat)
File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/metrics/pairwise.py", line 1351, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/metrics/pairwise.py", line 147, in check_pairwise_arrays
X = Y = check_array(
File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 899, in check_array
_assert_all_finite(
File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input contains NaN.
There is 1 unit with 0 spike (because Kilosort is very smart ...) Could it be the problem when computing the similarity?
We have a sorting = sorting.remove_empty_units() utils. You can use this.
I think that nan is the correct behavior for similarity. Maybe we should replace all Nan by 0 in the similarity matrix before exporting to phy.
Could you try a patch ?
I still get the same message and error after sorting = sorting.remove_empty_units()
I don't have time to debug right now, but I will look at it when I have the time.
What is really weird is that I didn't have this problem with a waveform extraction containing return_scaled=False
@DradeAW any update on this? Can I close?
Haven't encountered this issue again, I think you can close it :)
Thanks ane Merry Chirstmas + Happy New Year :)
Thank you, you too :)