pykilosort icon indicating copy to clipboard operation
pykilosort copied to clipboard

CUDA memory allocation error in mexWtW2

Open marius10p opened this issue 5 years ago • 3 comments

Sometimes this happens:

Finding merges: 100%|################################################################| 508/508 [00:06<00:00, 84.13it/s]

Traceback (most recent call last):

File "d:\github\pykilosort\pykilosort\gui\sorter.py", line 108, in run

self.context = run_spikesort(self.context)

File "d:\github\pykilosort\pykilosort\main.py", line 434, in run_spikesort

out = splitAllClusters(ctx, False)

File "d:\github\pykilosort\pykilosort\postprocess.py", line 755, in splitAllClusters

WtW, iList = getMeWtW(W.astype(cp.float32), U.astype(cp.float32), Nnearest)

File "d:\github\pykilosort\pykilosort\learn.py", line 523, in getMeWtW

wtw0 = mexWtW2(Params, W[:, :, i], W[:, :, j], utu0)

File "d:\github\pykilosort\pykilosort\learn.py", line 485, in mexWtW2

d_Params = cp.asarray(Params, dtype=np.float64, order='F')

File "C:\Users\Marius\anaconda3\envs\pyks2\lib\site-packages\cupy\creation\from_data.py", line 66, in asarray

return core.array(a, dtype, False, order)

File "cupy\core\core.pyx", line 1692, in cupy.core.core.array

File "cupy\core\core.pyx", line 1744, in cupy.core.core.array

File "cupy\core\core.pyx", line 1741, in cupy.core.core.array

File "cupy\cuda\pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory

File "cupy\cuda\pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc

File "cupy\cuda\pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc

File "cupy\cuda\pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc

File "cupy\cuda\pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc

File "cupy\cuda\pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc

File "cupy\cuda\pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.init

File "cupy\cuda\runtime.pyx", line 239, in cupy.cuda.runtime.hostAlloc

File "cupy\cuda\runtime.pyx", line 145, in cupy.cuda.runtime.check_status

cupy.cuda.runtime . CUDARuntimeError : cudaErrorIllegalAddress: an illegal memory access was encountered

marius10p avatar Oct 02 '20 15:10 marius10p

Is this happening on a rerun of a given dataset, or a new run in a fresh directory without any remaining cache files?

rossant avatar Oct 02 '20 15:10 rossant

Fresh run from inside the GUI. @shashwatsridhar has also gotten this before on a different dataset, though not on this one.

marius10p avatar Oct 02 '20 15:10 marius10p

Given d_Params = cp.asarray(Params, dtype=np.float64, order='F') I'd have a strong suspicion that something in Params isn't a float (or is NaN).

We probably should check the type / values of all the arrays before they end up on the GPU. Could work nicely with some wrapper around all of the CUDA kernel calls.

alexmorley avatar Nov 02 '20 17:11 alexmorley