Kilosort BUG: CUDA out of memory

Describe the issue:

Hello, I'm getting a CUDA out of memory error during the drift correction, when using the kilosort gui.

Trying to run a 64 channel, ~8gb recording on an RTX3060 (12gb VRAM).

Thanks for any help!

Reproduce the bug:

No response

Error message:

Traceback (most recent call last):

  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 70, in run

    
ops, bfile, st0 = compute_drift_correction(


  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 350, in compute_drift_correction

    
ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)


  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\datashift.py", line 192, in run

    
st, _, ops  = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)


  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run

    
xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)


  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 140, in template_match

    
A = torch.einsum('ijk, jklm-> iklm', weigh, B[iC,:, nb*t:nb*(t+1)])        


  File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\torch\functional.py", line 385, in einsum

    
return _VF.einsum(equation, operands)  # type: ignore[attr-defined]


torch.cuda
.
OutOfMemoryError
: 
CUDA out of memory. Tried to allocate 1.37 GiB. GPU

Version information:

CUDA 11.8 KS 4.0.5 Python 3.9.19

Context for the issue:

No response

Experiment information:

No response

Apr 25 '24 08:04 adam-hockley

Hi, I have similar issues with CUDA. Until fixed by the KS team, and if you are familiar with python, try adding torch.cuda.empty_cache() before and after GPU-heavy functions. This will release unused memory of the GPU. I hope it will help Anan

Apr 25 '24 15:04 ananmoran

@ananmoran It is very unlikely that this is related to your issue for this size of recording.

@adam-hockley What does your probe layout look like, and what settings did you change if any?

Apr 25 '24 17:04 jacobpennington

The probe is 1x64 linear array.

Probe was made in the kilosort gui with: y-coords: np.linspace(0, 1260, num=64) x-coords: np.linspace(50, 50, num=64) (also had the error with x=0 instead of 50) chan map: np.linspace(0, 63, num=64)

Apr 25 '24 19:04 adam-hockley

Okay, two follow-ups then:

Can you please paste in the rest of the output you got while sorting, so I can see if anything else looks off?
Can you try sorting without drift correction, by setting n_blocks = 0? That probe is just barely at the minimum recommendation for sampling density to get good drift estimates, so it's possible that's introducing an artifact leading to this issue.

Apr 25 '24 19:04 jacobpennington

Here's the whole ouput for the error in drift correction.

I also get a CUDA out of memory error later on if I skip the drift correction. output pasted further down.

Preprocessing filters computed in 0.43s; total 0.43s

computing drift

Re-computing universal templates from data.

C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\threadpoolctl.py:1223: RuntimeWarning: Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at the same time. Both libraries are known to be incompatible and this can cause random crashes or deadlocks on Linux when loaded in the same Python program. Using threadpoolctl may cause crashes or deadlocks. For more information and possible workarounds, please see https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

warnings.warn(msg, RuntimeWarning)

C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py:242: UserWarning: NaNs and/or zeroes present in weights for spikedetect.run, may need to adjust min_template_size and/or dminx for best results.

          If you're using a probe with multiple shanks, see 
          https://kilosort.readthedocs.io/en/latest/multi_shank.html

warnings.warn(msg, UserWarning)

0%| | 0/1032 [00:00<?, ?it/s]

0%| | 0/1032 [00:01<?, ?it/s]

Traceback (most recent call last):

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 70, in run

ops, bfile, st0 = compute_drift_correction(

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 350, in compute_drift_correction

ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\datashift.py", line 192, in run

st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run

xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 140, in template_match

A = torch.einsum('ijk, jklm-> iklm', weigh, B[iC,:, nbt:nb(t+1)])

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\torch\functional.py", line 385, in einsum

return _VF.einsum(equation, operands) # type: ignore[attr-defined]

torch.cuda . OutOfMemoryError : CUDA out of memory. Tried to allocate 1.37 GiB. GPU

Preprocessing filters computed in 0.41s; total 0.41s

computing drift

nblocks = 0, skipping drift correction

drift computed in 0.00s; total 0.41s

Extracting spikes using templates

Re-computing universal templates from data.

C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py:242: UserWarning: NaNs and/or zeroes present in weights for spikedetect.run, may need to adjust min_template_size and/or dminx for best results.

          If you're using a probe with multiple shanks, see 
          https://kilosort.readthedocs.io/en/latest/multi_shank.html

warnings.warn(msg, UserWarning)

0%| | 0/1032 [00:00<?, ?it/s]

Traceback (most recent call last):

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 398, in detect_spikes

st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run

xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)

File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 150, in template_match

Amax = torch.max(Aa[iC2], 0)[0]

torch.cuda . OutOfMemoryError : CUDA out of memory. Tried to allocate 2.83 GiB. GPU

Apr 25 '24 20:04 adam-hockley

Hmm... thanks. The warning about nans/zeros is definitely not expected for a single-shank linear probe. Would you be willing to share the data so I can debug this?

Apr 25 '24 22:04 jacobpennington

Sure, what's the best way to share it?

Apr 26 '24 07:04 adam-hockley

Putting it on a google drive has been working well for others, if that's an option for you. Then you can post the link in a reply here, or you can send it to my email at [email protected] if you don't want it visible publicly.

Apr 26 '24 16:04 jacobpennington

Thanks, here's the link. It's a 64 channel i16 file

https://drive.google.com/file/d/1_ngJKdbuHlN1a4KJpAspv899O7bTYxmt/view?usp=sharing

Apr 26 '24 17:04 adam-hockley

Got it, thanks. Is it 30kHz sampling rate?

Apr 26 '24 17:04 jacobpennington

It's 24414.0625 (TDT)

Apr 26 '24 17:04 adam-hockley

Well, the good news is I was able to sort the data without any errors using the default settings. The bad news is I am also using a card with 12gb of vram (and have sorted larger data sets with less vram in the past), so I'm not sure why you're seeing that error and I'm not.

A couple things to check:

Are there multiple video cards in that machine? If so, make sure the right one is selected in the "PyTorch device" dropdown menu.
Are there other applications running that might be using up the vram? An easy way to check is entering nvidia-smi in a terminal (without Kilosort running), which should give an output like this:

(kilosort4) PS C:\code\Kilosort> nvidia-smi
Fri Apr 26 12:32:58 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.22                 Driver Version: 552.22         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   42C    P0             31W /  285W |    1303MiB /  12282MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Apr 26 '24 19:04 jacobpennington

Thanks for the helpo Jacob, both of those tests looked normal, there's only one discrete GPU and it was being used by CUDA with nothin gthe in background. I've tested a new envirnoment and with kilosort 4.0.6 I dont have any issue!

Apr 27 '24 17:04 adam-hockley

Hi @jacobpennington , I am having the same issue as Adam. I am also using a 64 channel probe. I have tried all of the above mentioned fixes to no avail. Would I also be able to send you some test data and see if you are able to run it?

May 02 '24 14:05 igfebbo

Hi @igfebbo , yes that would be fine.

May 02 '24 21:05 jacobpennington

Thank you very much! I am using a 64 channel H9 probe. Let me know if you need any other information. Here is the link:

https://drive.google.com/file/d/1DpwB0iItG3VtcXFZykVgUpWOSLE4a9k9/view?usp=drive_link

May 05 '24 15:05 igfebbo

I'm unable to access it @igfebbo, I sent an access request through google from [email protected].

May 06 '24 00:05 jacobpennington

I have just shared it.

On Mon, May 6, 2024 at 1:24 AM Jacob Pennington @.***> wrote:

I'm unable to access it @igfebbo https://github.com/igfebbo, I sent an access request through google from @.***

— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/677#issuecomment-2095016851, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG56ZQJT74IFOBUBNLFJBDLZA3EVTAVCNFSM6AAAAABGYMOMHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGAYTMOBVGE . You are receiving this because you were mentioned.Message ID: @.***>

May 06 '24 18:05 igfebbo

Sorry about that!

On Mon, May 6, 2024 at 7:14 PM Isabella Febbo @.***> wrote:

I have just shared it.

On Mon, May 6, 2024 at 1:24 AM Jacob Pennington @.***> wrote:

I'm unable to access it @igfebbo https://github.com/igfebbo, I sent an access request through google from @.***

— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/677#issuecomment-2095016851, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG56ZQJT74IFOBUBNLFJBDLZA3EVTAVCNFSM6AAAAABGYMOMHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGAYTMOBVGE . You are receiving this because you were mentioned.Message ID: @.***>

May 06 '24 18:05 igfebbo

@igfebbo What type of probe are you using? Can you share the probe file, or paste in the output of "Print Probe" in the GUI? I'm not seeing any immediate issues, so it might be related to the probe geometry.

May 07 '24 22:05 jacobpennington

I am using a Cambridge neurotech, H9 probe. Here is the probe file: https://drive.google.com/file/d/1SakRB01PjjKae8OkNIZCXQXyM2XXlc8a/view?usp=sharing

30kHz sampling rate

May 07 '24 22:05 igfebbo

Thanks. I still didn't see any memory problems, it never used more than about 2GB of video memory at any one time. I know it's annoying, but did you try simply restarting the machine and sorting again? You would get the same error if other processes were using up the video memory.

May 07 '24 22:05 jacobpennington

Thank you for running it. We rebooted the server and got the same error.

I am also now getting this error: Non-native QFileDialog supports only local files Non-native QFileDialog supports only local files

May 08 '24 11:05 igfebbo

Okay. I'll look into that, but you should be able to get around it by just typing or copy-pasting the file location into the text area instead of clicking on "choose file."

May 08 '24 14:05 jacobpennington

Thanks, Jacob. We've noticed that we're getting a warning early on that might be informative... For reference, we're running a 2080 Super, driver 525.147.05, and CUDA v12.0. This is a multi-user Debian 12 machine, but no one else is accessing the GPU when we run our tests.

When we load the file and probe map we sent you, memory usage on the GPU goes from 0 to 1.3GB. We then see this warning:

/home/randy/.local/lib/python3.9/site-packages/kilosort/io.py:497: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1712608885084/work/torch/csrc/utils/tensor_numpy.cpp:206.) X[:, self.nt : self.nt+nsamp] = torch.from_numpy(data).to(self.device).float()

When we click Run, memory usage jumps to about 7GB and then we see the error message that the GPU couldn't allocate an additional 2GB of RAM (which makes sense as this is an 8GB GPU).

May 08 '24 15:05 igfebbo

The warning isn't meaningful, it's something we're aware of. However, we did suppress that warning a few versions ago around the same time that we implemented some bug fixes that could cause memory problems. Can you please install the latest version of Kilosort and try again? Or let me know if you're already using the latest version and still seeing that warning.

May 08 '24 17:05 jacobpennington

Ok, we got it running now. We did not realize that pip was defaulting to 4.0 rather than the latest version (4.0.6 yesterday).

May 10 '24 11:05 igfebbo

Kilosort Kilosort copied to clipboard

BUG: CUDA out of memory

Describe the issue:

Reproduce the bug:

Error message:

Version information:

Context for the issue:

Experiment information:

Kilosort
Kilosort copied to clipboard