marie-ai
marie-ai copied to clipboard
CUDA error: initialization error Compile with `TORCH_USE_CUDA_DSA'
Describe the bug
While migrating to torch-2.2.0.dev20231126+cu118
run into the issue. This could be due to the nature that this is a dev
release but adding it for tracking purposes.
Describe how you solve it
Use stable version torch 2.1.1
Environment
(marie) greg@xpredator:~/dev/marieai/marie-ai$ python -m detectron2.utils.collect_env
------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sys.platform linux
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
numpy 1.24.1
detectron2 0.6 @/home/greg/dev/3rdparty/detectron2/detectron2
detectron2._C not built correctly: /home/greg/dev/3rdparty/detectron2/detectron2/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNR5torch7Library4_defEOSt7variantIJN3c1012OperatorNameENS2_14FunctionSchemaEEEONS_11CppFunctionE
Compiler ($CXX) c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CUDA compiler Build cuda_11.8.r11.8/compiler.31833905_0
detectron2 arch flags 8.9
DETECTRON2_ENV_MODULE <not set>
PyTorch 2.2.0.dev20231126+cu118 @/home/greg/environment/marie/lib/python3.10/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 4090 (arch=8.9)
Driver version 545.23.06
CUDA_HOME /usr/local/cuda
Pillow 9.3.0
torchvision 0.17.0.dev20231126+cu118 @/home/greg/environment/marie/lib/python3.10/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5
iopath 0.1.9
cv2 4.8.1
------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.8
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.7
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.2.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
Screenshots
ERROR main : extract_t/rep-0@1330319
RuntimeError('CUDA error: initialization
error\nCompile with `TORCH_USE_CUDA_DSA` to enable
device-side assertions.\n') during 'WorkerRuntime'
initialization
add "--quiet-error" to suppress the exception
details
Traceback (most recent call last):
File
"/home/greg/dev/marieai/marie-ai/marie/serve/executo…
line 143, in run
runtime = AsyncNewLoopRuntime(
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 93, in __init__
self._loop.run_until_complete(self.async_setup())
File "/usr/lib/python3.10/asyncio/base_events.py",
line 649, in run_until_complete
return future.result()
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 310, in async_setup
self.server = self._get_server()
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 215, in _get_server
return GRPCServer(
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 34, in __init__
super().__init__(**kwargs)
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 70, in __init__
] = (req_handler or self._get_request_handler())
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 95, in _get_request_handler
return self.req_handler_cls(
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 140, in __init__
self._load_executor(
File
"/home/greg/dev/marieai/marie-ai/marie/serve/runtime…
line 379, in _load_executor
self._executor: BaseExecutor =
BaseExecutor.load_config(
File
"/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…
line 792, in load_config
obj = JAML.load(tag_yml, substitute=False,
runtime_args=runtime_args)
File
"/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…
line 174, in load
r = yaml.load(stream,
Loader=get_jina_loader_with_runtime(runtime_args))
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 81, in load
return loader.get_single_data()
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 51, in get_single_data
return self.construct_document(node)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 55, in construct_document
data = self.construct_object(node)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 100, in construct_object
data = constructor(self, node)
File
"/home/greg/dev/marieai/marie-ai/marie/jaml/__init__…
line 582, in _from_yaml
return get_parser(cls,
version=data.get('version', None)).parse(
File
"/home/greg/dev/marieai/marie-ai/marie/jaml/parsers/…
line 46, in parse
obj = cls(
File
"/home/greg/dev/marieai/marie-ai/marie/serve/executo…
line 58, in arg_wrapper
f = func(self, *args, **kwargs)
File
"/home/greg/dev/marieai/marie-ai/marie/serve/helper.…
line 74, in arg_wrapper
f = func(self, *args, **kwargs)
File
"/home/greg/dev/marieai/marie-ai/marie/executor/text…
line 98, in __init__
self.pipeline =
ExtractPipeline(pipeline_config=pipeline,
cuda=use_cuda)
File
"/home/greg/dev/marieai/marie-ai/marie/pipe/extract_…
line 94, in __init__
self.overlay_processor = OverlayProcessor(
File
"/home/greg/dev/marieai/marie-ai/marie/overlay/overl…
line 44, in __init__
self.opt, self.model = self.__setup(cuda,
checkpoint_dir)
File
"/home/greg/dev/marieai/marie-ai/marie/overlay/overl…
line 109, in __setup
model = create_model(opt)
File
"/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…
line 75, in create_model
instance = model(opt)
File
"/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…
line 45, in __init__
self.netG = networks.define_G(opt.input_nc,
opt.output_nc, opt.ngf, opt.netG,
File
"/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…
line 271, in define_G
return init_net(net, init_type, init_gain,
gpu_ids)
File
"/home/greg/dev/marieai/marie-ai/marie/models/pix2pi…
line 151, in init_net
net.to("cuda")
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 1152, in to
return self._apply(convert)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 802, in _apply
module._apply(fn)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 802, in _apply
module._apply(fn)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 825, in _apply
param_applied = fn(param)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 1150, in convert
return t.to(device, dtype if
t.is_floating_point() or t.is_complex() else None,
non_blocking)
File
"/home/greg/dev/marieai/marie-ai/venv/lib/python3.10…
line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA error: initialization error
Compile with `TORCH_USE_CUDA_DSA` to enable
device-side assertions.