spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

WaveformExtractor computing templates can reach limit open files

Open DradeAW opened this issue 3 years ago • 4 comments

Hi,

On Ubuntu, there is a limit to how many open files you can have at once (by default, I believe it's around 1,000-2,000).

But this means that if in the WaveformExtractor, we try to compute the template from >2,000 spikes, it can reach this limit and crash.

I believe the files should be open sequentially (i.e. 1 by 1) and put into the ram?

DradeAW avatar Sep 01 '22 10:09 DradeAW

We can try to fix it by adding this at the end of the unit loop: del wfs

@Dradeliomecus can you try locally if it fixes your issue?

alejoe91 avatar Oct 13 '22 11:10 alejoe91

I'll try, but shouldn't it be deleted automatically when the variable is overwritten?

I never understand how the garbage collecting works in Python ...

DradeAW avatar Oct 13 '22 11:10 DradeAW

I'll try, but shouldn't it be deleted automatically when the variable is overwritten?

I never understand how the garbage collecting works in Python ...

it should...then we can't explain the issue!

alejoe91 avatar Oct 13 '22 11:10 alejoe91

This is the line that crashes (during the allocation of all the memory maps): https://github.com/SpikeInterface/spikeinterface/blob/9f7b1ea48d2803c101cd2a1d18abd9ebf775e015/spikeinterface/core/waveform_tools.py#L160

DradeAW avatar Oct 13 '22 12:10 DradeAW

@DradeAW was this ever solved?

alejoe91 avatar Jun 12 '23 14:06 alejoe91

Trying with a MEArec recording.h5 containing 1,500 units:

$ ulimit -n 1024  # Limit number of open files to 1024 (default for Ubuntu 20.04)
$ python
>>> import spikeinterface.core as si
>>> import spikeinterface.extractors as se
>>> 
>>> recording = se.MEArecRecordingExtractor("recording.h5")
>>> sorting = se.MEArecSortingExtractor("recording.h5")
>>> 
>>> wvf_extractor = si.extract_waveforms(recording, sorting, "test/", mode="folder", max_spikes_per_unit=500, ms_before=1.0, ms_after=2.0, allow_unfiltered=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_extractor.py", line 1579, in extract_waveforms
    we.run_extract_waveforms(seed=seed, **job_kwargs)
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_extractor.py", line 1353, in run_extract_waveforms
    wfs_arrays = extract_waveforms_to_buffers(
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_tools.py", line 91, in extract_waveforms_to_buffers
    wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(
  File "/export/home1/users/nsr/wyngaard/dev/spikeinterface/spikeinterface/src/spikeinterface/core/waveform_tools.py", line 183, in allocate_waveforms_buffers
    arr = np.lib.format.open_memmap(filename, mode="w+", dtype=dtype, shape=shape)
  File "/users/nsr/wyngaard/dev/miniconda3/envs/MEArec/lib/python3.8/site-packages/numpy/lib/format.py", line 926, in open_memmap
    marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
  File "/users/nsr/wyngaard/dev/miniconda3/envs/MEArec/lib/python3.8/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files

DradeAW avatar Jun 12 '23 15:06 DradeAW

I don't have this problem anymore because I found a way to increase the number of open files to overwrite the default, but I still think the issue should be kept opened.

DradeAW avatar Jun 12 '23 15:06 DradeAW

I have the same problem. spykingcircus2 works on recordings with a few number of channels but crashes on MEA recordings.

$ cat spikeinterface_log.json
{
    "sorter_name": "spykingcircus2",
    "sorter_version": "2.0",
    "datetime": "2023-06-20T17:41:34.198963",
    "runtime_trace": [],
    "error": true,
    "error_trace": "Traceback (most recent call last):\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/basesorter.py\", line 226, in run_from_folder\n    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/internal/spyking_circus2.py\", line 111, in _run_from_folder\n    labels, peak_labels = find_cluster_from_peaks(recording_f, selected_peaks, method='random_projections',\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sortingcomponents/clustering/main.py\", line 39, in find_cluster_from_peaks\n    labels, peak_labels = method_class.main_function(recording, peaks, params)\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sortingcomponents/clustering/random_projections.py\", line 193, in main_function\n    we = extract_waveforms(recording, sorting, waveform_folder, ms_before=params['ms_before'],\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py\", line 1450, in extract_waveforms\n    we.run_extract_waveforms(seed=seed, **job_kwargs)\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py\", line 1255, in run_extract_waveforms\n    wfs_arrays = extract_waveforms_to_buffers(self.recording, spikes, unit_ids, nbefore, nafter,\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py\", line 82, in extract_waveforms_to_buffers\n    wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(recording, spikes, unit_ids, nbefore, nafter, mode=mode,\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py\", line 162, in allocate_waveforms_buffers\n    arr = np.lib.format.open_memmap(\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/lib/format.py\", line 926, in open_memmap\n    marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,\n  File \"/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/core/memmap.py\", line 267, in __new__\n    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)\nOSError: [Errno 24] Too many open files\n",
    "run_time": null
}
Traceback (most recent call last):
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/basesorter.py", line 226, in run_from_folder
    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/sorters/internal/tridesclous2.py", line 114, in _run_from_folder
    we = extract_waveforms(recording, sorting_temp, sorter_output_folder / "waveforms_temp",
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1450, in extract_waveforms
    we.run_extract_waveforms(seed=seed, **job_kwargs)
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1255, in run_extract_waveforms
    wfs_arrays = extract_waveforms_to_buffers(self.recording, spikes, unit_ids, nbefore, nafter,
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py", line 82, in extract_waveforms_to_buffers
    wfs_arrays, wfs_arrays_info = allocate_waveforms_buffers(recording, spikes, unit_ids, nbefore, nafter, mode=mode,
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/waveform_tools.py", line 162, in allocate_waveforms_buffers
    arr = np.lib.format.open_memmap(
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/lib/format.py", line 926, in open_memmap
    marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
  File "/home/rth/.local/apps/spikes/lib/python3.10/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)

Any idea how ti fix it? @DradeAW how did you increase number of open file? Are you on linux? on my system:

$ cat /proc/sys/fs/file-max
9223372036854775807

rat-h avatar Jun 22 '23 19:06 rat-h

@rat-h You can raise the limit with ulimit -n 8192 (will increase the limit to 8,192 files). You can type ulimit -n to know the current limit.

This will however only work for the current console, I don't remember how I changed the limit for good :/

DradeAW avatar Jun 22 '23 19:06 DradeAW

Ah, that makes sense. I forgot about "user limits"! Thank you so much.

BTW to make it permanent sudo nano /etc/security/limits.conf and and add uset_name hard nofile 16384 . Also add session required pam_limits.so to the end of /etc/pam.d/common-session

I think system should be rebooted after that or at least all sessions should be closed.

rat-h avatar Jun 22 '23 20:06 rat-h

Hi all. I have a plan to reimplement the waveform_tools and make the extraction into a unique big file for all waveforms. And then optional split back to one file per unit. Having a unique big file have many advantage and some drawback like unaligned sparse snipets when usign sparsity and slower mean computation.

samuelgarcia avatar Jun 23 '23 06:06 samuelgarcia

I'll close this one and leave #2396 open since I think users will see that actual error (ie better discoverablility). Since SortingAnalyzer doesn't have this problem hopefully this doesn't come up once that is released to pypi.

zm711 avatar Mar 25 '24 13:03 zm711