spikeinterface
spikeinterface copied to clipboard
WaveformExtractor computing templates can reach limit open files
Hi,
On Ubuntu, there is a limit to how many open files you can have at once (by default, I believe it's around 1,000-2,000).
But this means that if in the WaveformExtractor, we try to compute the template from >2,000 spikes, it can reach this limit and crash.
I believe the files should be open sequentially (i.e. 1 by 1) and put into the ram?
We can try to fix it by adding this at the end of the unit loop:
del wfs
@Dradeliomecus can you try locally if it fixes your issue?
I'll try, but shouldn't it be deleted automatically when the variable is overwritten?
I never understand how the garbage collecting works in Python ...
I'll try, but shouldn't it be deleted automatically when the variable is overwritten?
I never understand how the garbage collecting works in Python ...
it should...then we can't explain the issue!
This is the line that crashes (during the allocation of all the memory maps): https://github.com/SpikeInterface/spikeinterface/blob/9f7b1ea48d2803c101cd2a1d18abd9ebf775e015/spikeinterface/core/waveform_tools.py#L160
@DradeAW was this ever solved?
Trying with a MEArec recording.h5 containing 1,500 units:
$ ulimit -n 1024 # Limit number of open files to 1024 (default for Ubuntu 20.04)
$ python
>>> import spikeinterface.core as si
>>> import spikeinterface.extractors as se
>>>
>>> recording = se.MEArecRecordingExtractor("recording.h5")
>>> sorting = se.MEArecSortingExtractor("recording.h5")
>>>
>>> wvf_extractor = si.extract_waveforms(recording, sorting, "test/", mode="folder", max_spikes_per_unit=500, ms_before=1.0, ms_after=2.0, allow_unfiltered=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_extractor.py", line 1579, in extract_waveforms
we.run_extract_waveforms(seed=seed, **job_kwargs)
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_extractor.py", line 1353, in run_extract_waveforms
wfs_arrays = extract_waveforms_to_buffers(
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_tools.py", line 91, in extract_waveforms_to_buffers
wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(
File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_tools.py", line 183, in allocate_waveforms_buffers
arr = np.lib.format.open_memmap(filename, mode="w+", dtype=dtype, shape=shape)
File "/users/nsr/wyngaard/dev/miniconda3/envs/MEArec/lib/python3.8/site-packages/numpy/lib/format.py", line 926, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/users/nsr/wyngaard/dev/miniconda3/envs/MEArec/lib/python3.8/site-packages/numpy/core/memmap.py", line 267, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files
I don't have this problem anymore because I found a way to increase the number of open files to overwrite the default, but I still think the issue should be kept opened.
I have the same problem. spykingcircus2 works on recordings with a few number of channels but crashes on MEA recordings.
$ cat spikeinterface_log.json
{
"sorter_name": "spykingcircus2",
"sorter_version": "2.0",
"datetime": "2023-06-20T17:41:34.198963",
"runtime_trace": [],
"error": true,
"error_trace": "Traceback (most recent call last):\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/basesorter.py\", line 226, in run_from_folder\n SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/internal/spyking_circus2.py\", line 111, in _run_from_folder\n labels, peak_labels = find_cluster_from_peaks(recording_f, selected_peaks, method='random_projections',\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sortingcomponents/clustering/main.py\", line 39, in find_cluster_from_peaks\n labels, peak_labels = method_class.main_function(recording, peaks, params)\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sortingcomponents/clustering/random_projections.py\", line 193, in main_function\n we = extract_waveforms(recording, sorting, waveform_folder, ms_before=params['ms_before'],\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py\", line 1450, in extract_waveforms\n we.run_extract_waveforms(seed=seed, **job_kwargs)\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py\", line 1255, in run_extract_waveforms\n wfs_arrays = extract_waveforms_to_buffers(self.recording, spikes, unit_ids, nbefore, nafter,\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py\", line 82, in extract_waveforms_to_buffers\n wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(recording, spikes, unit_ids, nbefore, nafter, mode=mode,\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py\", line 162, in allocate_waveforms_buffers\n arr = np.lib.format.open_memmap(\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/lib/format.py\", line 926, in open_memmap\n marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,\n File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/core/memmap.py\", line 267, in __new__\n mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)\nOSError: [Errno 24] Too many open files\n",
"run_time": null
}
Traceback (most recent call last):
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/basesorter.py", line 226, in run_from_folder
SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/internal/tridesclous2.py", line 114, in _run_from_folder
we = extract_waveforms(recording, sorting_temp, sorter_output_folder / "waveforms_temp",
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1450, in extract_waveforms
we.run_extract_waveforms(seed=seed, **job_kwargs)
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1255, in run_extract_waveforms
wfs_arrays = extract_waveforms_to_buffers(self.recording, spikes, unit_ids, nbefore, nafter,
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py", line 82, in extract_waveforms_to_buffers
wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(recording, spikes, unit_ids, nbefore, nafter, mode=mode,
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py", line 162, in allocate_waveforms_buffers
arr = np.lib.format.open_memmap(
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/lib/format.py", line 926, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
Any idea how ti fix it? @DradeAW how did you increase number of open file? Are you on linux? on my system:
$ cat /proc/sys/fs/file-max
9223372036854775807
@rat-h You can raise the limit with ulimit -n 8192 (will increase the limit to 8,192 files).
You can type ulimit -n to know the current limit.
This will however only work for the current console, I don't remember how I changed the limit for good :/
Ah, that makes sense. I forgot about "user limits"! Thank you so much.
BTW to make it permanent sudo nano /etc/security/limits.conf and and add uset_name hard nofile 16384 . Also add session required pam_limits.so to the end of /etc/pam.d/common-session
I think system should be rebooted after that or at least all sessions should be closed.
Hi all. I have a plan to reimplement the waveform_tools and make the extraction into a unique big file for all waveforms. And then optional split back to one file per unit. Having a unique big file have many advantage and some drawback like unaligned sparse snipets when usign sparsity and slower mean computation.
I'll close this one and leave #2396 open since I think users will see that actual error (ie better discoverablility). Since SortingAnalyzer doesn't have this problem hopefully this doesn't come up once that is released to pypi.