GPU memory allocated by make_context cannot be released when exception.
Describe the bug I want to initialize as many cuda contexts as possible in a multi-threaded environment, but when cuda.Device(0).make_context() throws an exception, the GPU memory allocated by make_context cannot be released.
To Reproduce
import time
import logging
import pycuda.driver as cuda
# from PyQt5.QtCore import QThread
class GpuContext(object):
def __init__(self, name=None):
self.logging = logging.getLogger('GpuContext - ' + name)
self.cuda_context = None
self.cuda_context = cuda.Device(0).make_context()
def __del__(self):
if self.cuda_context:
try:
self.cuda_context.pop()
except:
self.logging.exception('self.cuda_context.pop()')
try:
self.cuda_context.detach()
except:
self.logging.exception('self.cuda_context.detach() error')
try:
del self.cuda_context
self.cuda_context = None
except:
self.logging.exception('del self.cuda_context error')
# class GpuThread(QThread):
# def __init__(self, gpuContext, parent=None):
# super().__init__(parent)
# self.gpuContext = gpuContext
#
# def run(self):
# pass
if __name__ == '__main__':
cuda.init()
gupContexts = []
for i in range(100):
try:
gpuContext = GpuContext(str(i))
gupContexts.append(gpuContext)
except:
logging.exception('init error')
break
while len(gupContexts) > 0:
gpuContext = gupContexts.pop()
del gpuContext
while True:
print('main')
time.sleep(1)
Environment (please complete the following information):
- OS: ubuntu18
- CUDA version: 10.2, V10.2.89
- CUDA driver version: 460.91.03
- PyCUDA version: pycuda-2021.1
- Python version: 3.6.9
What result are you observing? What result are you expecting? The error handling in that code path looks RAII-safe, so it should do the right thing:
https://github.com/inducer/pycuda/blob/9f3b898ec0846e2a4dff5077d4403ea03b1fccf9/src/cpp/cuda.hpp#L854-L863
ERROR:root:init error
Traceback (most recent call last):
File "test_r.py", line 47, in <module>
gpuContext = GpuContext(str(i))
File "test_r.py", line 12, in __init__
self.cuda_context = cuda.Device(0).make_context()
pycuda._driver.MemoryError: cuCtxCreate failed: out of memory
ERROR:GpuContext - 36:self.cuda_context.pop()
Traceback (most recent call last):
File "test_r.py", line 17, in __del__
self.cuda_context.pop()
pycuda._driver.LogicError: cuCtxPopCurrent failed: invalid device context
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuCtxDetach failed: invalid device context
When an exception occurs, how to release the GPU memory, as shown in the figure, 39M GPU memory.
It's not easy to explain. Situation 1: Under normal circumstances, on a laptop (different graphics cards), when all cuda contexts are released, GPU memory is not occupied. On my desktop, all cuda contexts are released normally, but some GPUs are still occupied.
Scenario 2: Deliberately generate multiple exceptions and find that the GPU memory usage has increased.
import time
import logging
import pycuda.driver as cuda
# from PyQt5.QtCore import QThread
class GpuContext(object):
def __init__(self, name=None):
self.logging = logging.getLogger('GpuContext - ' + name)
self.cuda_context = None
self.cuda_context = cuda.Device(0).make_context()
def __del__(self):
if self.cuda_context:
try:
self.cuda_context.pop()
except:
self.logging.exception('self.cuda_context.pop()')
try:
self.cuda_context.detach()
except:
self.logging.exception('self.cuda_context.detach() error')
try:
del self.cuda_context
self.cuda_context = None
except:
self.logging.exception('del self.cuda_context error')
# class GpuThread(QThread):
# def __init__(self, gpuContext, parent=None):
# super().__init__(parent)
# self.gpuContext = gpuContext
#
# def run(self):
# pass
if __name__ == '__main__':
cuda.init()
for j in range(10):
gupContexts = []
for i in range(100):
try:
gpuContext = GpuContext(str(i))
gupContexts.append(gpuContext)
except:
logging.exception('init error')
break
while len(gupContexts) > 0:
gpuContext = gupContexts.pop()
del gpuContext
while True:
print('main')
time.sleep(1)

Throwing 10 exceptions, GPU memory occupies 390M.
There may be a problem with the make_context method. prepare_context_switch may cause the context to be switched, but the GPU memory allocation is unsuccessful. Therefore, the pop of the previous context is unsuccessful, resulting in the cuCtxPopCurrent failed: invalid device context exception.
CuCtxPopCurrent failed: invalid device context exception caused by unsuccessful pop of the previous context. So the GPU memory is not released?
I suspect that prepare_context_switch leaves the context stack in an inconsistent state in case of an error. It should be replaced with a RAII construct that restores the previous state if the switch did not succeed.
It'll be a while before I have time to look into this. PRs welcome in the meantime!