mmcv
mmcv copied to clipboard
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Thanks for reporting the unexpected results and we appreciate it a lot.
Checklist
- I have searched related issues but cannot get the expected help. yes
- I have read the FAQ documentation but cannot get the expected help. yes
- The unexpected results still exist in the latest version. no
Describe the Issue
when i import mmcv and use python multiprocessing, i will get this Error;
I understand why only import mmcv and not use mmcv will get this Error, this code will be normal when i no import mmcv;
I know add torch.multiprocessing.set_start_method("spawn")
will be normal, but i want know what the environment wil be change when i import mmcv
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
This is my code
import multiprocessing as mp
import numpy as np
import torch
import sys
import traceback
from loguru import logger
import os
import mmcv
class TestProcess(mp.Process):
def __init__(self, job_queue) -> None:
super(TestProcess, self).__init__()
self.job_queue = job_queue
def run(self):
while True:
try:
task = self.job_queue.get(timeout=5)
img = torch.from_numpy(task).unsqueeze(0)
img = img.cuda()
logger.error(f"b6-{str(os.getpid())}-{str(os.getppid())}")
except:
traceback.print_exc()
break
if __name__ == "__main__":
# torch.multiprocessing.set_start_method("spawn")
manager = mp.Manager()
job_queue = manager.Queue(1000)
logger.error(f"0-{str(os.getpid())}-{str(os.getppid())}")
task_list = []
for index in range(1):
one = TestProcess(job_queue)
task_list.append(one)
for one_task in task_list:
one_task.start()
for index in range(10):
one_data = np.random.random((100, 100, 3))
job_queue.put(one_data)
one.join()
Environment
{'sys.platform': 'linux', 'Python': '3.7.7 (default, Jan 22 2022, 21:27:43) [GCC 9.3.0]', 'CUDA available': True, 'GPU 0': 'NVIDIA GeForce GTX 1060', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.3.r11.3/compiler.29745058_0', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0', 'PyTorch': '1.9.0+cu102', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 10.2\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n - CuDNN 7.6.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.0+cu102', 'OpenCV': '4.5.5', 'MMCV': '1.4.7', 'MMCV Compiler': 'GCC 7.5', 'MMCV CUDA Compiler': '11.3'}
Error traceback
2022-05-20 19:15:47.236 | ERROR | __main__:<module>:37 - 0-20563-7433
Traceback (most recent call last):
File "test_issure.py", line 24, in run
img = img.cuda()
File "/home/nicken/.pyenv/versions/pytorch/lib/python3.7/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
I try use same code in latest versio, I get same Error
Environment
{'sys.platform': 'linux', 'Python': '3.7.7 (default, Jan 22 2022, 21:27:43) [GCC 9.3.0]', 'CUDA available': True, 'GPU 0': 'NVIDIA GeForce GTX 1060', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Cuda compilation tools, release 11.3, V11.3.58', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0', 'PyTorch': '1.11.0+cu113', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.3\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.2\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'TorchVision': '0.12.0+cu113', 'OpenCV': '4.5.5', 'MMCV': '1.5.1', 'MMCV Compiler': 'GCC 9.4', 'MMCV CUDA Compiler': '11.3'}
Hi @nicken , thanks for your report. We will try to reproduce the error.
I have reproduced the error. This error is a bit strange.
Did you see anything that might be a problem?
I have a similar problem, have you solved it?