Kilosort
Kilosort copied to clipboard
BUG: CUDA out of memory
Describe the issue:
Hello, I'm getting a CUDA out of memory error during the drift correction, when using the kilosort gui.
Trying to run a 64 channel, ~8gb recording on an RTX3060 (12gb VRAM).
Thanks for any help!
Reproduce the bug:
No response
Error message:
Traceback (most recent call last):
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 70, in run
ops, bfile, st0 = compute_drift_correction(
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 350, in compute_drift_correction
ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\datashift.py", line 192, in run
st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run
xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 140, in template_match
A = torch.einsum('ijk, jklm-> iklm', weigh, B[iC,:, nb*t:nb*(t+1)])
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\torch\functional.py", line 385, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
torch.cuda
.
OutOfMemoryError
:
CUDA out of memory. Tried to allocate 1.37 GiB. GPU
Version information:
CUDA 11.8 KS 4.0.5 Python 3.9.19
Context for the issue:
No response
Experiment information:
No response
Hi, I have similar issues with CUDA. Until fixed by the KS team, and if you are familiar with python, try adding torch.cuda.empty_cache() before and after GPU-heavy functions. This will release unused memory of the GPU. I hope it will help Anan
@ananmoran It is very unlikely that this is related to your issue for this size of recording.
@adam-hockley What does your probe layout look like, and what settings did you change if any?
The probe is 1x64 linear array.
Probe was made in the kilosort gui with: y-coords: np.linspace(0, 1260, num=64) x-coords: np.linspace(50, 50, num=64) (also had the error with x=0 instead of 50) chan map: np.linspace(0, 63, num=64)
Okay, two follow-ups then:
- Can you please paste in the rest of the output you got while sorting, so I can see if anything else looks off?
- Can you try sorting without drift correction, by setting
n_blocks = 0
? That probe is just barely at the minimum recommendation for sampling density to get good drift estimates, so it's possible that's introducing an artifact leading to this issue.
Here's the whole ouput for the error in drift correction.
I also get a CUDA out of memory error later on if I skip the drift correction. output pasted further down.
Preprocessing filters computed in 0.43s; total 0.43s
computing drift
Re-computing universal templates from data.
C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\threadpoolctl.py:1223: RuntimeWarning: Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at the same time. Both libraries are known to be incompatible and this can cause random crashes or deadlocks on Linux when loaded in the same Python program. Using threadpoolctl may cause crashes or deadlocks. For more information and possible workarounds, please see https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md
warnings.warn(msg, RuntimeWarning)
C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py:242: UserWarning:
NaNs and/or zeroes present in weights for spikedetect.run,
may need to adjust min_template_size
and/or dminx
for best results.
If you're using a probe with multiple shanks, see
https://kilosort.readthedocs.io/en/latest/multi_shank.html
warnings.warn(msg, UserWarning)
0%| | 0/1032 [00:00<?, ?it/s]
0%| | 0/1032 [00:01<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 70, in run
ops, bfile, st0 = compute_drift_correction(
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 350, in compute_drift_correction
ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\datashift.py", line 192, in run
st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run
xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 140, in template_match
A = torch.einsum('ijk, jklm-> iklm', weigh, B[iC,:, nbt:nb(t+1)])
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\torch\functional.py", line 385, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
torch.cuda . OutOfMemoryError : CUDA out of memory. Tried to allocate 1.37 GiB. GPU
Preprocessing filters computed in 0.41s; total 0.41s
computing drift
nblocks = 0, skipping drift correction
drift computed in 0.00s; total 0.41s
Extracting spikes using templates
Re-computing universal templates from data.
C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py:242: UserWarning:
NaNs and/or zeroes present in weights for spikedetect.run,
may need to adjust min_template_size
and/or dminx
for best results.
If you're using a probe with multiple shanks, see
https://kilosort.readthedocs.io/en/latest/multi_shank.html
warnings.warn(msg, UserWarning)
0%| | 0/1032 [00:00<?, ?it/s]
0%| | 0/1032 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\gui\sorter.py", line 82, in run
st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\run_kilosort.py", line 398, in detect_spikes
st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 255, in run
xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)
File "C:\Users\ANL\anaconda3\envs\SI_env_fresh\lib\site-packages\kilosort\spikedetect.py", line 150, in template_match
Amax = torch.max(Aa[iC2], 0)[0]
torch.cuda . OutOfMemoryError : CUDA out of memory. Tried to allocate 2.83 GiB. GPU
Hmm... thanks. The warning about nans/zeros is definitely not expected for a single-shank linear probe. Would you be willing to share the data so I can debug this?
Sure, what's the best way to share it?
Putting it on a google drive has been working well for others, if that's an option for you. Then you can post the link in a reply here, or you can send it to my email at [email protected] if you don't want it visible publicly.
Thanks, here's the link. It's a 64 channel i16 file
https://drive.google.com/file/d/1_ngJKdbuHlN1a4KJpAspv899O7bTYxmt/view?usp=sharing
Got it, thanks. Is it 30kHz sampling rate?
It's 24414.0625 (TDT)
Well, the good news is I was able to sort the data without any errors using the default settings. The bad news is I am also using a card with 12gb of vram (and have sorted larger data sets with less vram in the past), so I'm not sure why you're seeing that error and I'm not.
A couple things to check:
- Are there multiple video cards in that machine? If so, make sure the right one is selected in the "PyTorch device" dropdown menu.
- Are there other applications running that might be using up the vram? An easy way to check is entering
nvidia-smi
in a terminal (without Kilosort running), which should give an output like this:
(kilosort4) PS C:\code\Kilosort> nvidia-smi
Fri Apr 26 12:32:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti WDDM | 00000000:01:00.0 On | N/A |
| 0% 42C P0 31W / 285W | 1303MiB / 12282MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
Thanks for the helpo Jacob, both of those tests looked normal, there's only one discrete GPU and it was being used by CUDA with nothin gthe in background. I've tested a new envirnoment and with kilosort 4.0.6 I dont have any issue!
Hi @jacobpennington , I am having the same issue as Adam. I am also using a 64 channel probe. I have tried all of the above mentioned fixes to no avail. Would I also be able to send you some test data and see if you are able to run it?
Hi @igfebbo , yes that would be fine.
Thank you very much! I am using a 64 channel H9 probe. Let me know if you need any other information. Here is the link:
https://drive.google.com/file/d/1DpwB0iItG3VtcXFZykVgUpWOSLE4a9k9/view?usp=drive_link
I'm unable to access it @igfebbo, I sent an access request through google from [email protected].
I have just shared it.
On Mon, May 6, 2024 at 1:24 AM Jacob Pennington @.***> wrote:
I'm unable to access it @igfebbo https://github.com/igfebbo, I sent an access request through google from @.***
— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/677#issuecomment-2095016851, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG56ZQJT74IFOBUBNLFJBDLZA3EVTAVCNFSM6AAAAABGYMOMHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGAYTMOBVGE . You are receiving this because you were mentioned.Message ID: @.***>
Sorry about that!
On Mon, May 6, 2024 at 7:14 PM Isabella Febbo @.***> wrote:
I have just shared it.
On Mon, May 6, 2024 at 1:24 AM Jacob Pennington @.***> wrote:
I'm unable to access it @igfebbo https://github.com/igfebbo, I sent an access request through google from @.***
— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/677#issuecomment-2095016851, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG56ZQJT74IFOBUBNLFJBDLZA3EVTAVCNFSM6AAAAABGYMOMHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGAYTMOBVGE . You are receiving this because you were mentioned.Message ID: @.***>
@igfebbo What type of probe are you using? Can you share the probe file, or paste in the output of "Print Probe" in the GUI? I'm not seeing any immediate issues, so it might be related to the probe geometry.
I am using a Cambridge neurotech, H9 probe. Here is the probe file: https://drive.google.com/file/d/1SakRB01PjjKae8OkNIZCXQXyM2XXlc8a/view?usp=sharing
30kHz sampling rate
Thanks. I still didn't see any memory problems, it never used more than about 2GB of video memory at any one time. I know it's annoying, but did you try simply restarting the machine and sorting again? You would get the same error if other processes were using up the video memory.
Thank you for running it. We rebooted the server and got the same error.
I am also now getting this error: Non-native QFileDialog supports only local files Non-native QFileDialog supports only local files
Okay. I'll look into that, but you should be able to get around it by just typing or copy-pasting the file location into the text area instead of clicking on "choose file."
Thanks, Jacob. We've noticed that we're getting a warning early on that might be informative... For reference, we're running a 2080 Super, driver 525.147.05, and CUDA v12.0. This is a multi-user Debian 12 machine, but no one else is accessing the GPU when we run our tests.
When we load the file and probe map we sent you, memory usage on the GPU goes from 0 to 1.3GB. We then see this warning:
/home/randy/.local/lib/python3.9/site-packages/kilosort/io.py:497: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1712608885084/work/torch/csrc/utils/tensor_numpy.cpp:206.) X[:, self.nt : self.nt+nsamp] = torch.from_numpy(data).to(self.device).float()
When we click Run, memory usage jumps to about 7GB and then we see the error message that the GPU couldn't allocate an additional 2GB of RAM (which makes sense as this is an 8GB GPU).
The warning isn't meaningful, it's something we're aware of. However, we did suppress that warning a few versions ago around the same time that we implemented some bug fixes that could cause memory problems. Can you please install the latest version of Kilosort and try again? Or let me know if you're already using the latest version and still seeing that warning.
Ok, we got it running now. We did not realize that pip was defaulting to 4.0 rather than the latest version (4.0.6 yesterday).