spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

`export_to_phy` with large dataset

Open DradeAW opened this issue 3 years ago • 10 comments

Hi,

I'm trying to use the export_to_phy function on a rather large dataset (30 minutes neuropixels recording with 300+ units).

When I try to use the function "normally", I get an error because I don't have enough RAM:

wvf_extractor = si.extract_waveforms(recording, sorting, "/mnt/raid0/tmp/spk_int", ms_before=3.0, ms_after=4.0, max_spikes_per_unit=1000, load_if_exists=False, return_scaled=False, chunk_duration='2s', n_jobs=12)
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average")

I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 183, in export_to_phy
    amplitudes = compute_spike_amplitudes(waveform_extractor, peak_sign=peak_sign, outputs='concatenated', 
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 179, in compute_spike_amplitudes
    sac.compute_amplitudes(**job_kwargs)
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 124, in compute_amplitudes
    out = processor.run()
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/job_tools.py", line 253, in run
    res = self.func(segment_index, frame_start, frame_stop, worker_ctx)
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 267, in _spike_amplitudes_chunk
    traces = recording.get_traces(start_frame=first, end_frame=last+1, segment_index=segment_index,
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/baserecording.py", line 115, in get_traces
    traces = traces.astype('float32') * gains + offsets
numpy.core._exceptions.MemoryError: Unable to allocate 82.4 GiB for an array with shape (57599828, 384) and data type float32

I thus decided to use chunking to avoid this problem

spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)

And I got the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 180, in export_to_phy
    sac = waveform_extractor.load_extension('spike_amplitudes')
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/waveform_extractor.py", line 204, in load_extension
    return ext_class.load_from_folder(self.folder)
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/core/waveform_extractor.py", line 877, in load_from_folder
    ext._specific_load_from_folder()
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/spike_amplitudes.py", line 69, in _specific_load_from_folder
    amps_seg = np.load(file_amps)
  File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 407, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/raid0/tmp/spk_int/spike_amplitudes/amplitude_segment_0.npy'

How should I go about exporting to phy with a large dataset?

Thank you, Aurélien W.

DradeAW avatar May 30 '22 12:05 DradeAW

The error seeams to be in compute spike amplitudes.

Could you try tpo compute spike amplitudes before export_to_phy ? Lets see why this fails.

My guess is that the chunk_size is not propagated properlly to this sub function.

samuelgarcia avatar May 30 '22 13:05 samuelgarcia

amplitudes = st.compute_spike_amplitudes(wvf_extractor, load_if_exists=False, chunk_duration='2s', n_jobs=12)
spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)

Works fine ... except that I cannot open the folder with phy template-gui params.py:

16:16:09.568 [W] model:1153           File None does not exist.
16:16:09.844 [W] model:545            Unreferenced clusters found in templates (generally not a problem)
16:16:10.026 [W] model:567            Unreferenced clusters found in spike_clusters (generally not a problem)
16:16:10.033 [E] __init__:62          An error has occurred (AssertionError): 
Traceback (most recent call last):
  File "/users/nsr/wyngaard/.conda/envs/phy/bin/phy", line 8, in <module>
    sys.exit(phycli())
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phy/apps/__init__.py", line 159, in cli_template_gui
    template_gui(params_path, **kwargs)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phy/apps/template/gui.py", line 198, in template_gui
    controller = TemplateController(model=load_model(params_path), dir_path=dir_path, **kwargs)
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 1178, in load_model
    return TemplateModel(**get_template_params(params_path))
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 297, in __init__
    self._load_data()
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 356, in _load_data
    self.sparse_templates = self._load_templates()
  File "/users/nsr/wyngaard/.conda/envs/phy/lib/python3.7/site-packages/phylib/io/model.py", line 636, in _load_templates
    assert data.dtype in (np.float32, np.float64)
AssertionError

spike_templates.npy has a dtype of int64

DradeAW avatar May 30 '22 14:05 DradeAW

This is strange because by default compute_spike_amplitudes()have return_scaled=True. Could you check the dtype in the sub folder "spike_amplitudes".

Also could you check the dtype with rec.get_traces(..., return_scaled=True)

There must a bug.

samuelgarcia avatar May 30 '22 14:05 samuelgarcia

amplitude_segment_0.npy has a dtype of float32.

I ran compute_spike_amplitudes(return_scaled=True) but extract_waveforms(return_scaled=False)

Does it come from here?

DradeAW avatar May 30 '22 14:05 DradeAW

The spikes amplitudes load all traces again so it is independant.

This is strange that amplitude_segment_0.npy has a dtype of float32 and the one copied in phy is int64. Could you debug line186 in to_phy.py and print the dtype ?

samuelgarcia avatar May 30 '22 14:05 samuelgarcia

This is strange that amplitude_segment_0.npy has a dtype of float32 and the one copied in phy is int64.

No the one in phy is float32.

Is it spike_templates.npy that is of type int64.

DradeAW avatar May 30 '22 14:05 DradeAW

ah! Sorry I was thinking it was the spike amplitudes the problem!!!

We need a fix in export_to_phy to convert to float even if you we use resturn_scaled=False.

Waiting the fix you should use extract_waveforms(return_scaled=True) it should fix your bug no ?

samuelgarcia avatar May 30 '22 14:05 samuelgarcia

Well I get another problem now...

spikeinterface.exporters.export_to_phy(wvf_extractor, "/mnt/raid0/data/MEArec/kilosort2_default_phy", compute_pc_features=False, compute_amplitudes=True, copy_binary=False, max_channels_per_template=24, template_mode="average", chunk_duration='2s', n_jobs=8)
Warning: empty units have been removed when being exported to Phy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/exporters/to_phy.py", line 161, in export_to_phy
    template_similarity = compute_template_similarity(waveform_extractor, method='cosine_similarity')
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/spikeinterface/toolkit/postprocessing/template_similarity.py", line 37, in compute_template_similarity
    similarity = sklearn.metrics.pairwise.cosine_similarity(templates_flat, templates_other_flat)
  File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/metrics/pairwise.py", line 1351, in cosine_similarity
    X, Y = check_pairwise_arrays(X, Y)
  File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/metrics/pairwise.py", line 147, in check_pairwise_arrays
    X = Y = check_array(
  File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 899, in check_array
    _assert_all_finite(
  File "/users/nsr/wyngaard/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input contains NaN.

There is 1 unit with 0 spike (because Kilosort is very smart ...) Could it be the problem when computing the similarity?

DradeAW avatar May 30 '22 15:05 DradeAW

We have a sorting = sorting.remove_empty_units() utils. You can use this.

I think that nan is the correct behavior for similarity. Maybe we should replace all Nan by 0 in the similarity matrix before exporting to phy.

Could you try a patch ?

samuelgarcia avatar May 30 '22 15:05 samuelgarcia

I still get the same message and error after sorting = sorting.remove_empty_units()

I don't have time to debug right now, but I will look at it when I have the time.

What is really weird is that I didn't have this problem with a waveform extraction containing return_scaled=False

DradeAW avatar May 30 '22 15:05 DradeAW

@DradeAW any update on this? Can I close?

alejoe91 avatar Dec 28 '22 10:12 alejoe91

Haven't encountered this issue again, I think you can close it :)

DradeAW avatar Dec 28 '22 11:12 DradeAW

Thanks ane Merry Chirstmas + Happy New Year :)

alejoe91 avatar Dec 28 '22 11:12 alejoe91

Thank you, you too :)

DradeAW avatar Dec 28 '22 11:12 DradeAW