pycuda GPU memory allocated by make_context cannot be released when exception.

Describe the bug I want to initialize as many cuda contexts as possible in a multi-threaded environment, but when cuda.Device(0).make_context() throws an exception, the GPU memory allocated by make_context cannot be released.

To Reproduce

import time
import logging

import pycuda.driver as cuda
# from PyQt5.QtCore import QThread


class GpuContext(object):
    def __init__(self, name=None):
        self.logging = logging.getLogger('GpuContext - ' + name)
        self.cuda_context = None
        self.cuda_context = cuda.Device(0).make_context()

    def __del__(self):
        if self.cuda_context:
            try:
                self.cuda_context.pop()
            except:
                self.logging.exception('self.cuda_context.pop()')

            try:
                self.cuda_context.detach()
            except:
                self.logging.exception('self.cuda_context.detach() error')

            try:
                del self.cuda_context
                self.cuda_context = None
            except:
                self.logging.exception('del self.cuda_context error')


# class GpuThread(QThread):
#     def __init__(self, gpuContext, parent=None):
#         super().__init__(parent)
#         self.gpuContext = gpuContext
#
#     def run(self):
#         pass


if __name__ == '__main__':
    cuda.init()
    gupContexts = []
    for i in range(100):
        try:
            gpuContext = GpuContext(str(i))
            gupContexts.append(gpuContext)
        except:
            logging.exception('init error')
            break

    while len(gupContexts) > 0:
        gpuContext = gupContexts.pop()
        del gpuContext

    while True:
        print('main')
        time.sleep(1)

Environment (please complete the following information):

OS: ubuntu18
CUDA version: 10.2, V10.2.89
CUDA driver version: 460.91.03
PyCUDA version: pycuda-2021.1
Python version: 3.6.9

Dec 20 '21 03:12 reallijie

What result are you observing? What result are you expecting? The error handling in that code path looks RAII-safe, so it should do the right thing:

https://github.com/inducer/pycuda/blob/9f3b898ec0846e2a4dff5077d4403ea03b1fccf9/src/cpp/cuda.hpp#L854-L863

Dec 20 '21 03:12 inducer

ERROR:root:init error
Traceback (most recent call last):
  File "test_r.py", line 47, in <module>
    gpuContext = GpuContext(str(i))
  File "test_r.py", line 12, in __init__
    self.cuda_context = cuda.Device(0).make_context()
pycuda._driver.MemoryError: cuCtxCreate failed: out of memory
ERROR:GpuContext - 36:self.cuda_context.pop()
Traceback (most recent call last):
  File "test_r.py", line 17, in __del__
    self.cuda_context.pop()
pycuda._driver.LogicError: cuCtxPopCurrent failed: invalid device context
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuCtxDetach failed: invalid device context

Dec 20 '21 03:12 reallijie

111 When an exception occurs, how to release the GPU memory, as shown in the figure, 39M GPU memory.

Dec 20 '21 03:12 reallijie

It's not easy to explain. Situation 1: Under normal circumstances, on a laptop (different graphics cards), when all cuda contexts are released, GPU memory is not occupied. On my desktop, all cuda contexts are released normally, but some GPUs are still occupied.

Scenario 2: Deliberately generate multiple exceptions and find that the GPU memory usage has increased.

Dec 20 '21 04:12 reallijie

import time
import logging

import pycuda.driver as cuda
# from PyQt5.QtCore import QThread


class GpuContext(object):
    def __init__(self, name=None):
        self.logging = logging.getLogger('GpuContext - ' + name)
        self.cuda_context = None
        self.cuda_context = cuda.Device(0).make_context()

    def __del__(self):
        if self.cuda_context:
            try:
                self.cuda_context.pop()
            except:
                self.logging.exception('self.cuda_context.pop()')

            try:
                self.cuda_context.detach()
            except:
                self.logging.exception('self.cuda_context.detach() error')

            try:
                del self.cuda_context
                self.cuda_context = None
            except:
                self.logging.exception('del self.cuda_context error')


# class GpuThread(QThread):
#     def __init__(self, gpuContext, parent=None):
#         super().__init__(parent)
#         self.gpuContext = gpuContext
#
#     def run(self):
#         pass


if __name__ == '__main__':
    cuda.init()
    for j in range(10):
        gupContexts = []
        for i in range(100):
            try:
                gpuContext = GpuContext(str(i))
                gupContexts.append(gpuContext)
            except:
                logging.exception('init error')
                break

        while len(gupContexts) > 0:
            gpuContext = gupContexts.pop()
            del gpuContext

    while True:
        print('main')
        time.sleep(1)

Dec 20 '21 04:12 reallijie

Throwing 10 exceptions, GPU memory occupies 390M.

Dec 20 '21 04:12 reallijie

There may be a problem with the make_context method. prepare_context_switch may cause the context to be switched, but the GPU memory allocation is unsuccessful. Therefore, the pop of the previous context is unsuccessful, resulting in the cuCtxPopCurrent failed: invalid device context exception.

Dec 20 '21 04:12 reallijie

CuCtxPopCurrent failed: invalid device context exception caused by unsuccessful pop of the previous context. So the GPU memory is not released?

Dec 20 '21 04:12 reallijie

I suspect that prepare_context_switch leaves the context stack in an inconsistent state in case of an error. It should be replaced with a RAII construct that restores the previous state if the switch did not succeed.

Dec 20 '21 04:12 inducer

It'll be a while before I have time to look into this. PRs welcome in the meantime!

Dec 20 '21 04:12 inducer