DeviceMemoryPool causes crash at shutdown
Consider the following piece of code:
import numpy as np
import pycuda.autoinit
import pycuda.gpuarray as gpu
from pycuda.tools import DeviceMemoryPool
cuda_memory_pool = DeviceMemoryPool()
X = np.array([1, 2, 3.0])
X = gpu.to_gpu(X, allocator=cuda_memory_pool.allocate)
On my machine, this gives:
python2.7 test.py
terminate called after throwing an instance of 'pycuda::error'
what(): explicit_context_dependent failed: invalid context - no currently active context?
Aborted
This crash happens AFTER all cleanup operations, somewhere in the guts of pycuda. (At this stage, the _finish_up() routine from pycuda.autoinit has already been run). While the main script runs through, this makes it impossible to profile scripts through the Nvidia Visual Profiler, which is rather annoying.
Could you compile PyCUDA with API tracing and rerun this to figure out what happens?
How do I do that?
The problem vanishes if we add
cuda_memory_pool.stop_holding()
To the end of the file. Weirdly, just calling cuda_memory_pool.free_held() is not enough. So I'm assuming the device manager only really releases its allocation-blocks once it's told that it can stop holding data.... which otherwise only happens when the python shutdown-routine calls the C++ destructor (when the CUDA context has long been released)
However, the blocks need to be released before the CUDA context is deleted, as they are explicit_context_dependent (or aren't they)?
How do I do that?
Set CUDA_TRACE = True in siteconf.py.
Both the pool and the blocks should keep the context alive, but apparently it can't be switched to at the point in cleanup where that's needed. The trace would help figure out if the context has already been freed at that point.
This is the output with CUDA_TRACE=True
$ python3.4 test.py
cuInit
cuDeviceGetCount
cuDeviceGet
cuCtxCreate
cuCtxGetDevice
cuMemAlloc
cuCtxGetDevice
cuDeviceGetAttribute
cuDeviceGetAttribute
cuDeviceComputeCapability
cuDeviceGetAttribute
cuDeviceGetAttribute
cuDeviceComputeCapability
cuDeviceComputeCapability
cuDeviceGetAttribute
cuMemcpyHtoD
cuCtxPopCurrent
terminate called after throwing an instance of 'pycuda::error'
what(): explicit_context_dependent failed: invalid device context - no currently active context?
Aborted
For completeness' sake, test.py looks like this:
$ cat test.py
import numpy as np
import pycuda.autoinit
import pycuda.gpuarray as gpu
from pycuda.tools import DeviceMemoryPool
cuda_memory_pool = DeviceMemoryPool()
X = np.array([1, 2, 3.0])
X = gpu.to_gpu(X, allocator=cuda_memory_pool.allocate)
cuda_memory_pool.free_held()
#cuda_memory_pool.stop_holding()
#del(cuda_memory_pool)
Any news/ideas on this?
Can you find a (C++) backtrace of (a) what triggers that cuCtxPopCurrent and (b) the explicit_context_dependent invocation that actually triggers the error? FWIW, that error occurs when a new explicit_context_dependent object is constructed, which seems strange that late in the game. I wonder what object this is.