pykilosort
pykilosort copied to clipboard
CUDA memory allocation error in mexWtW2
Sometimes this happens:
Finding merges: 100%|################################################################| 508/508 [00:06<00:00, 84.13it/s]
Traceback (most recent call last):
File "d:\github\pykilosort\pykilosort\gui\sorter.py", line 108, in run
self.context = run_spikesort(self.context)
File "d:\github\pykilosort\pykilosort\main.py", line 434, in run_spikesort
out = splitAllClusters(ctx, False)
File "d:\github\pykilosort\pykilosort\postprocess.py", line 755, in splitAllClusters
WtW, iList = getMeWtW(W.astype(cp.float32), U.astype(cp.float32), Nnearest)
File "d:\github\pykilosort\pykilosort\learn.py", line 523, in getMeWtW
wtw0 = mexWtW2(Params, W[:, :, i], W[:, :, j], utu0)
File "d:\github\pykilosort\pykilosort\learn.py", line 485, in mexWtW2
d_Params = cp.asarray(Params, dtype=np.float64, order='F')
File "C:\Users\Marius\anaconda3\envs\pyks2\lib\site-packages\cupy\creation\from_data.py", line 66, in asarray
return core.array(a, dtype, False, order)
File "cupy\core\core.pyx", line 1692, in cupy.core.core.array
File "cupy\core\core.pyx", line 1744, in cupy.core.core.array
File "cupy\core\core.pyx", line 1741, in cupy.core.core.array
File "cupy\cuda\pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
File "cupy\cuda\pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy\cuda\pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
File "cupy\cuda\pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
File "cupy\cuda\pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.init
File "cupy\cuda\runtime.pyx", line 239, in cupy.cuda.runtime.hostAlloc
File "cupy\cuda\runtime.pyx", line 145, in cupy.cuda.runtime.check_status
cupy.cuda.runtime . CUDARuntimeError : cudaErrorIllegalAddress: an illegal memory access was encountered
Is this happening on a rerun of a given dataset, or a new run in a fresh directory without any remaining cache files?
Fresh run from inside the GUI. @shashwatsridhar has also gotten this before on a different dataset, though not on this one.
Given d_Params = cp.asarray(Params, dtype=np.float64, order='F') I'd have a strong suspicion that something in Params isn't a float (or is NaN).
We probably should check the type / values of all the arrays before they end up on the GPU. Could work nicely with some wrapper around all of the CUDA kernel calls.