pycuda DeviceMemoryPool causes crash at shutdown

Consider the following piece of code:

import numpy as np
import pycuda.autoinit
import pycuda.gpuarray as gpu
from pycuda.tools import DeviceMemoryPool

cuda_memory_pool = DeviceMemoryPool()
X = np.array([1, 2, 3.0])
X = gpu.to_gpu(X, allocator=cuda_memory_pool.allocate)

On my machine, this gives:

python2.7 test.py
terminate called after throwing an instance of 'pycuda::error'
  what():  explicit_context_dependent failed: invalid context - no currently active context?
Aborted

This crash happens AFTER all cleanup operations, somewhere in the guts of pycuda. (At this stage, the _finish_up() routine from pycuda.autoinit has already been run). While the main script runs through, this makes it impossible to profile scripts through the Nvidia Visual Profiler, which is rather annoying.

Jul 02 '15 15:07 untom

Could you compile PyCUDA with API tracing and rerun this to figure out what happens?

Jul 02 '15 16:07 inducer

How do I do that?

The problem vanishes if we add

 cuda_memory_pool.stop_holding()

To the end of the file. Weirdly, just calling cuda_memory_pool.free_held() is not enough. So I'm assuming the device manager only really releases its allocation-blocks once it's told that it can stop holding data.... which otherwise only happens when the python shutdown-routine calls the C++ destructor (when the CUDA context has long been released)

However, the blocks need to be released before the CUDA context is deleted, as they are explicit_context_dependent (or aren't they)?

Jul 02 '15 16:07 untom

How do I do that?

Set CUDA_TRACE = True in siteconf.py.

Both the pool and the blocks should keep the context alive, but apparently it can't be switched to at the point in cleanup where that's needed. The trace would help figure out if the context has already been freed at that point.

Jul 02 '15 17:07 inducer

This is the output with CUDA_TRACE=True

$ python3.4 test.py                                                                                                                                                   
cuInit
cuDeviceGetCount
cuDeviceGet
cuCtxCreate
cuCtxGetDevice
cuMemAlloc
cuCtxGetDevice
cuDeviceGetAttribute
cuDeviceGetAttribute
cuDeviceComputeCapability
cuDeviceGetAttribute
cuDeviceGetAttribute
cuDeviceComputeCapability
cuDeviceComputeCapability
cuDeviceGetAttribute
cuMemcpyHtoD
cuCtxPopCurrent
terminate called after throwing an instance of 'pycuda::error'
  what():  explicit_context_dependent failed: invalid device context - no currently active context?
Aborted

Jul 03 '15 10:07 untom

For completeness' sake, test.py looks like this:

$ cat test.py 
import numpy as np
import pycuda.autoinit
import pycuda.gpuarray as gpu
from pycuda.tools import DeviceMemoryPool

cuda_memory_pool = DeviceMemoryPool()
X = np.array([1, 2, 3.0])
X = gpu.to_gpu(X, allocator=cuda_memory_pool.allocate)


cuda_memory_pool.free_held()
#cuda_memory_pool.stop_holding()
#del(cuda_memory_pool)

Jul 03 '15 11:07 untom

Any news/ideas on this?

Jul 30 '15 11:07 untom

Can you find a (C++) backtrace of (a) what triggers that cuCtxPopCurrent and (b) the explicit_context_dependent invocation that actually triggers the error? FWIW, that error occurs when a new explicit_context_dependent object is constructed, which seems strange that late in the game. I wonder what object this is.

Jul 31 '15 00:07 inducer