Kilosort error： ValueError: n_samples=5 should be >= n

Describe the issue:

I use kilosort4 to sort a 64 channel data. It comes an error said n_samples is too small. Is that means my data is not enough to sort? How can I fix it?

Reproduce the bug:

No response

Error message:

Traceback (most recent call last):
  File "/home/wangxy/wxy_sorting_curation.py", line 68, in <module>
    aggregate_sorting = si.run_sorter_by_property(sorter_name='kilosort4', recording=rec,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/launcher.py", line 297, in run_sorter_by_property
    sorting_list = run_sorter_jobs(job_list, engine=engine, engine_kwargs=engine_kwargs, return_output=True)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/launcher.py", line 106, in run_sorter_jobs
    sorting = run_sorter(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/runsorter.py", line 175, in run_sorter
    return run_sorter_local(**common_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/runsorter.py", line 225, in run_sorter_local
    SorterClass.run_from_folder(output_folder, raise_error, verbose)
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/basesorter.py", line 293, in run_from_folder
    raise SpikeSortingError(
spikeinterface.sorters.utils.misc.SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/basesorter.py", line 258, in run_from_folder
    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/spikeinterface/sorters/external/kilosort4.py", line 240, in _run_from_folder
    st, tF, _, _ = detect_spikes(ops, device, bfile, tic0=tic0, progress_bar=progress_bar)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/kilosort/run_kilosort.py", line 398, in detect_spikes
    st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/kilosort/spikedetect.py", line 188, in run
    ops['wPCA'], ops['wTEMP'] = extract_wPCA_wTEMP(
                                ^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/kilosort/spikedetect.py", line 74, in extract_wPCA_wTEMP
    model = KMeans(n_clusters=ops['settings']['n_templates'], n_init = 10).fit(clips)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py", line 1490, in fit
    self._check_params_vs_input(X)
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py", line 1431, in _check_params_vs_input
    super()._check_params_vs_input(X, default_n_init=10)
  File "/home/wangxy/conda/envs/si_env/lib/python3.11/site-packages/sklearn/cluster/_kmeans.py", line 879, in _check_params_vs_input
    raise ValueError(
ValueError: n_samples=5 should be >= n_clusters=6.

Version information:

kilosort4

Context for the issue:

No response

Experiment information:

No response

Jun 15 '24 01:06 Batter-Wang

I have been experiencing the same issue. Does this mean no unit detected?

Jun 15 '24 15:06 hiroyukioya

Please make sure you are using the latest version of Kilosort4, and run it without SpikeInterface. If you still encounter the error, upload kilosort4.log here from the results directory.

Jun 15 '24 16:06 jacobpennington

Hi, I used most recent version of kilosort4 without SpikeInterface and still get this. File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 47, in ops, *rest = kilosort4RUN(shankdat,s0,shank,probe,fs,numchan,channeighb,9,8, bsec) File "C:\PycharmProjects\SpikeInterface\kilosort_subF.py", line 217, in kilosort4RUN data_dtype=data_type, do_CAR=do_CAR) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 161, in run_kilosort ops, bfile, st0 = compute_drift_correction( File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 425, in compute_drift_correction ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\kilosort\datashift.py", line 197, in run st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 204, in run ops['wPCA'], ops['wTEMP'] = extract_wPCA_wTEMP( File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 75, in extract_wPCA_wTEMP model = KMeans(n_clusters=ops['settings']['n_templates'], n_init = 10).fit(clips) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\sklearn\base.py", line 1473, in wrapper return fit_method(estimator, *args, **kwargs) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\sklearn\cluster_kmeans.py", line 1470, in fit self._check_params_vs_input(X) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\sklearn\cluster_kmeans.py", line 1411, in _check_params_vs_input super()._check_params_vs_input(X, default_n_init=10) File "C:\Users\hiroy\anaconda3\envs\kilosort\lib\site-packages\sklearn\cluster_kmeans.py", line 875, in _check_params_vs_input raise ValueError( ValueError: n_samples=3 should be >= n_clusters=6.

log is here: 06-15 12:36 kilosort.run_kilosort INFO Kilosort version 4.0.12 06-15 12:36 kilosort.run_kilosort INFO Sorting C:\UNITS\672\672-142\shdat3.bin 06-15 12:36 kilosort.run_kilosort INFO ---------------------------------------- 06-15 12:36 kilosort.run_kilosort INFO Skipping common average reference. 06-15 12:36 kilosort.run_kilosort INFO Using GPU for PyTorch computations. Specify device to change this. 06-15 12:36 kilosort.run_kilosort DEBUG Initial ops: { 'n_chan_bin': 8, 'fs': 32000.0, 'batch_size': 160000, 'nblocks': 1, 'Th_universal': 9, 'Th_learned': 8, 'tmin': 0, 'tmax': inf, 'nt': 65, 'shift': None, 'scale': None, 'artifact_threshold': inf, 'nskip': 25, 'whitening_range': 8, 'binning_depth': 5, 'sig_interp': 20, 'drift_smoothing': [0.5, 0.5, 0.5], 'nt0min': 21, 'dmin': None, 'dminx': 1, 'min_template_size': 200, 'template_sizes': 5, 'nearest_chans': 1, 'nearest_templates': 1, 'max_channel_distance': 1, 'templates_from_data': True, 'n_templates': 6, 'n_pcs': 6, 'Th_single_ch': 6, 'acg_threshold': 0.2, 'ccg_threshold': 0.25, 'cluster_downsampling': 20, 'x_centers': None, 'duplicate_spike_bins': 7, 'filename': WindowsPath('C:/UNITS/672/672-142/shdat3.bin'), 'data_dir': WindowsPath('C:/UNITS/672/672-142'), 'data_dtype': 'float32', 'do_CAR': False, 'invert_sign': False, 'NTbuff': 160130, 'Nchan': 8, 'torch_device': 'cuda', 'save_preprocessed_copy': False, 'chanMap': array([0, 1, 2, 3, 4, 5, 6, 7]), 'xc': array([10., 10., 10., 10., 10., 10., 10., 10.], dtype=float32), 'yc': array([ 0., 20., 40., 60., 80., 100., 120., 140.], dtype=float32), 'kcoords': array([0., 0., 0., 0., 0., 0., 0., 0.]), 'n_chan': 8}

06-15 12:36 kilosort.run_kilosort INFO
06-15 12:36 kilosort.run_kilosort INFO Computing preprocessing variables. 06-15 12:36 kilosort.run_kilosort INFO ---------------------------------------- 06-15 12:36 kilosort.run_kilosort INFO Preprocessing filters computed in 0.06s; total 0.06s 06-15 12:36 kilosort.run_kilosort DEBUG hp_filter shape: torch.Size([30122]) 06-15 12:36 kilosort.run_kilosort DEBUG whiten_mat shape: torch.Size([8, 8]) 06-15 12:36 kilosort.run_kilosort INFO
06-15 12:36 kilosort.run_kilosort INFO Computing drift correction. 06-15 12:36 kilosort.run_kilosort INFO ---------------------------------------- 06-15 12:36 kilosort.spikedetect INFO Re-computing universal templates from data.

Jun 15 '24 17:06 hiroyukioya

@hiroyukioya My guess is that because of your large batch size, that step is not selecting enough data to cluster the waveforms for making templates, unless your recording is quite long. Two options for that: you can try reducing the batch size to around the default of 60000 (or maybe 64000, for your sampling rate). You could also use the pre-generated universal templates by setting templates_from_data = False - you would also need to change nt = 61 to make that work.

Another possibility: I noticed your data is float32. Some other users have reported problems that ultimately were happening because their data was on a very different scale from what KS4 expects for standard int16 data. I would recommend you try loading the data in the KS4 GUI to make sure it looks sensible. If it looks blank or washed out, you may need to use the scale parameter to adjust it - try to get the data roughly on the order of -100 to +100 or larger.

Jun 15 '24 18:06 jacobpennington

If you still encounter problems after trying the above changes please let us know and we can re-open this.

Jul 03 '24 16:07 jacobpennington

error： ValueError: n_samples=5 should be >= n_clusters=6 -> Spike sorting failed.

Describe the issue:

Reproduce the bug:

Error message:

Version information:

Context for the issue:

Experiment information: