Kilosort out of memory error

I realize this has been posted before but I was unable to find a solution in the prior posts that ameliorated my problem. Kilosort 2 is crashing just at the end of the main optimization step for me with an out of memory error.

Hardware/Software preliminaries. Ubuntu 18.04 Ram: 64 GB GPU: NVIDIA 980Ti CUDA version 9.1 Matlab2018b 1 TB 860 EVO Samsung SSD mexGPUAll runs flawlessly.

File size: 262 GB (~3 hours) from a neuropixel 3A probe Reducing the file size certainly helps. If I sort a file that is only 30 minutes there are no problems.

I tried reducing batch size by 2x by setting ks.ops.NT = ks.ops.NT/2; That blew things up a different way and gave me a CUDA_ERROR_ILLEGAL_ADDRESS. Though I may not be setting batch size properly.

I'm not sure what to do. Is there a way to reduce the number of channels kilosort is sorting across. I know that the signals of interest only appeared on 128 channels of this recording.

Sep 02 '19 23:09 rtraghavan

The GUI allows you to manually set channels off. You can also drop unwanted channels from the channel map file. Where do you run out of memory? To reduce ops.NT, note the specifications in the config file:

ops.NT = 64*1024+ ops.ntbuff; % must be multiple of 32 + ntbuff. This is the batch size (try decreasing if out of memory).

Sep 20 '19 13:09 marius10p

we have the same problems for sorting large files.

we first tried the ops.NT = 64*1024+ ops.ntbuff; on a 700GB file, and it failed with out of memory on a GTX 1080 with 8GB VRAM.

we then tried to decrease ops.NT = 64*900+ ops.ntbuff;, then it will give lots of warnings(CUDA Illegal Address) without sorting results.

we then tried to concatenate less files into a smaller 400GB bin file, and it worked.

My point is sometimes we need to sort a really large file, so is it possible to caching some of the temporal CUDA result to main RAM (which may easily upgraded to 64+GB), so that we don't need to buy a expensive($2000+) NVidia card with 16+GB VRAM. the kilosort2 wiki listed similar problems in the Hardware GUIDE page.

it would be better that kilosort2 could continuously sort on one file after another, keeping the templates, cluster id, etc. as I mentioned in #37, quiet like the OSort.

Just to bring it up, and would love to hear some thoughts from @marius10p about removing these limits

Oct 16 '19 05:10 babaq

Where do you get the out of memory error, at what step of the pipeline? The only step that doesn't scale nicely with recording time is the drift estimation, which happens right after preprocessing.

Oct 16 '19 12:10 marius10p

the out of memory error showed up at the last of pre clustering batch, here I am showing the successful sort message just for illustration.

time 4760.14, pre clustered 9001 / 9170 batches , out of memory error here!!! time 0.23, compared 1 / 9170 batches .

Oct 16 '19 17:10 babaq

I get this error during the learnAndSolve8b section of the master_kilosort script. I reduced batches to 32*1024 + ops.ntbuff and it crashed at 18301/21779 batches.

Nov 25 '19 01:11 rtraghavan

Update: I reduced the batch size again to 32x256 + ops.ntbuff. It still crashes, though now during the preclustered step as @babaq notes above. This time from a cuda memory error.

Would larger memory capacity help? I can try to use a computer with double what I currently have 128 GB RAM.

Nov 25 '19 15:11 rtraghavan

These are distinct problems, so let's treat them separately.

If you have too many batches, out-of-memory problems are due to the creation of an Nbatches by Nbatches matrix on the GPU. This grows quadratically in the number of batches, which means increasing GPU RAM is likely to only help for slightly bigger Nbatches. I can probably streamline this, but it requires a significant re-write. I will try to do this soon, and come back with an update.
@rtraghavan , can you copy paste the exact error and the line at which it happens in learnAndSolve8b? You should probably go back to the original batch size, and report that error.
@babaq Kilosort2 was never meant to sort such long recordings, nor is it a good idea to sort them all together. Kilosort2 can track neurons over periods of at most a few hours. If you need longer, you need to get creative with splitting your files and then merging back neurons that look the same, or that fire at the same times in a segment of overlapping time. Same if you have a very large number of channels.

Nov 25 '19 15:11 marius10p

we are currently doing a kilosort on a ~400GB concantated file, the first out of memory error is at the last of pre clustering, I could see the system RAM was ramping up until it hit the wall, so I increased the windows page file to at least 200GB, since I already have 64GB physical RAM.

After passing the pre-clustering, the GPU out of memory is occurred when doing automatic spiting and merging, my GTX 1080 with 8GB could not be able to handle this, so we have to use a machine with Quadro P5000 with 16GB, and then it worked through.

@marius10p we have some rough ideas similar to spit large files to ~400GB and remerge each cluster, but that definitely need more work.

Nov 25 '19 21:11 babaq

Hmm, that's not where I thought the GPU out-of-memory would occur. How many channels do you have and how long the recording?

See also the advice I gave to @brendonw1 here: https://github.com/MouseLand/Kilosort2/issues/135#issuecomment-558303801

Nov 26 '19 14:11 marius10p

we are using Phase3A on a ~6hr long data

Nov 27 '19 19:11 babaq

Hi @marius10p, when you said "Kilosort2 was never meant to sort such long recordings", is this the same for the the original KiloSort? Thanks

Dec 11 '19 04:12 shihaisun

Well, neither of them was developed specifically for sorting long recordings. I think people have had more success with big files on Kilosort1, partly because it is less complex so there are fewer memory bottlenecks.

I also meant to say that the drift tracking in Kilosort2 would only work well up to some length of the recording (which can be different for different datasets). Beyond that duration, tracking can fail catastrophically with no possibility of recovery, and it would make more sense to try to break the recording into pieces, sort separately, and combine the results, like the advice I gave @brendonw1 in issue #135 .

Dec 22 '19 14:12 marius10p

Hi Marius,

I am posting here rather than on issue 176 of Cortexlab branch because it is currently active.

I essentially also encounter a memory error, but for a 2h30-long recording on a 3B probe (209GB). I would really like to circumvent the need to split and reconcatenate my datafiles, which I will end up doing if we do not find any workaround here.

System OS: Windows 10 GPU: Nvidia GTX 1080 (TotalMemory: 1.1811e+10) RAM: 96GB MATLAB: R2019b CUDA: 10.1
Steps to reproduce the error Run kilosort 2 (MATLAB version) on a 2h30-long recording on a 3B probe (209GB).
Error

3.1) If I reset the GPU between rez = preprocessDataSub(ops); and rez = clusterSingleBatches(rez); using gpuDevice(1) (as well as before rez = preprocessDataSub(ops);):

Finished compressing time-varying templates 
initialized spike counts
merged 547 into 548 
...
merged 554 into 538 
Found 0 splits, checked 1/560 clusters, nccg 0
...
Found 5969 splits, checked 6501/6529 clusters, nccg 2248 
Finished splitting. Found 6001 splits, checked 6561/6561 clusters, nccg 2258 
Error using mexWtW2
Maximum variable size allowed on the device is exceeded.

Error in getMeWtW (line 11)
        wtw0 =  mexWtW2(Params, W(:,:,i), W(:,:,j), utu0);

Error in splitAllClusters (line 182)
[WtW, iList] = getMeWtW(single(rez.W), single(rez.U), Nnearest);

Error in master_kilosort (line 48)
rez = splitAllClusters(rez, 1);

Error in metamaster_MB (line 39)
master_kilosort(datasets{1}{1}, datasets{1}{2}, datasets{1}{3});

Here, it seems quite surprising to me that as many as 5969 splits are found, starting from 560 units. It seems like this is a bug which once fixed would solve the memory error issue because wtw0 wouldn't be so huge.

3.2) If I do not reset the GPU between rez = preprocessDataSub(ops); and rez = clusterSingleBatches(rez); (but only BEFORE rez = preprocessDataSub(ops);):

...
Found 919 splits, checked 501/1479 clusters, nccg 272 
Error using gpuArray/filter
Out of memory on device. To view more detail about available memory on the GPU, use
'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in my_conv2 (line 47)
        S1 = filter(gaus, 1, cat(1, S1, zeros([tmax, dsnew2(2)])));

Error in splitAllClusters (line 56)
    clp = clp - my_conv2(clp, 250, 1);

Error in master_kilosort (line 43)
rez = splitAllClusters(rez, 1);

Error in metamaster_MB (line 39)
master_kilosort(datasets{1}{1}, datasets{1}{2}, datasets{1}{3});

Here the number of splits is way more reasonable, only 919 (which is still huge but understandable from a 2h30 long recording), so wtw0 must have a small enough size, does not lead to 3.1 error but leaves room for a gpu memory error.

Thank you for your help Marius, Maxime

Feb 21 '20 11:02 m-beau

@m-beau FYI there is a WIP Python port of KS2 here https://github.com/rossant/pykilosort one of the goals is to solve the memory issues. Feel free to try it, there will be rough edges but I'm here to help if needed. There more testers there are the better.

Feb 21 '20 11:02 rossant

Hi Cyrille,

Thanks for helping! I already tried it and it crashes even earlier, I still get a RAM memory error during the initial clustering step (which I got rid of on the MATLAB version by stuffing my tower with 96GB or RAM) - would you be happy to let me open an issue on your python port repo?

Merci! Maxime

Feb 21 '20 11:02 m-beau

That's unexpected! when was that? Quite a few memory issues were fixed 2 or 3 weeks ago. Please feel free to open issues there indeed.

Feb 21 '20 11:02 rossant

Hello

I ran into similar issues and can't find a way to solve it. I am wondering whether it is because my system is not powerful enough, especially my GPU which might be too short for this.

System OS: Windows 10 Pro GPU: NVIDIA Quadro P620 RAM: 32 GB Matlab R2019b CUDA 10.1
Recordings 2h30 of linear probe of 32 channels (2 columns of 16 channels, spaced of 50 um vertically and horizontally). ~20 GB of data.
Error with Kilosort3: Error using mexWtW2 Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in getMeWtW (line 18) wtw0 = mexWtW2(Params, W(:,:,i), W(:,:,j), utu0); % temporal convolutions get multiplied wit hthe spatial products

Error in trackAndSort (line 100) [WtW, iList] = getMeWtW(single(W), single(U), Nnearest);

Error in main_kilosort3 (line 44) [rez, st3, tF] = trackAndSort(rez);

Thanks for your help Guilhem

May 12 '22 10:05 guilhemibos

Kilosort Kilosort copied to clipboard

out of memory error

Kilosort
Kilosort copied to clipboard