pytorch-pwc
pytorch-pwc copied to clipboard
cupy issue
Hi, I would like to thanks for sharing this project first.
When I build the environment to run the project. I got the following message. I am using RTX4090, and python 3.8.
(env1) root@autodl-container-d137439c7c-0e4544de:~/autodl-tmp/code/pytorch-pwc# python run.py --model default --one ./images/one.png --two ./images/two.png --out ./out.flo
Traceback (most recent call last):
File "run.py", line 320, in <module>
tenOutput = estimate(tenOne, tenTwo)
File "run.py", line 306, in estimate
tenFlow = torch.nn.functional.interpolate(input=netNetwork(tenPreprocessedOne, tenPreprocessedTwo), size=(intHeight, intWidth), mode='bilinear', align_corners=False)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "run.py", line 267, in forward
objEstimate = self.netSix(tenOne[-1], tenTwo[-1], None)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "run.py", line 195, in forward
tenVolume = torch.nn.functional.leaky_relu(input=correlation.FunctionCorrelation(tenOne=tenOne, tenTwo=tenTwo), negative_slope=0.1, inplace=False)
File "/root/autodl-tmp/code/pytorch-pwc/./correlation/correlation.py", line 384, in FunctionCorrelation
return _FunctionCorrelation.apply(tenOne, tenTwo)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/autodl-tmp/code/pytorch-pwc/./correlation/correlation.py", line 289, in forward
cupy_launch('kernel_Correlation_rearrange', cupy_kernel('kernel_Correlation_rearrange', {
File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret
File "/root/autodl-tmp/code/pytorch-pwc/./correlation/correlation.py", line 273, in cupy_launch
return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 464, in compile_with_cache
return _compile_module_with_cache(*args, **kwargs)
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/root/miniconda3/envs/env1/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 561, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid
and I checked the PyTorch, nvcc and cupy version. It looks compatible to me. Do you have any idea how to resolve this issue? Thanks in advance.
(env1) root@autodl-container-d137439c7c-0e4544de:~/autodl-tmp/code/pytorch-pwc# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(env1) root@autodl-container-d137439c7c-0e4544de:~/autodl-tmp/code/pytorch-pwc# python
Python 3.8.18 (default, Sep 11 2023, 13:40:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
11.8
>>> import cupy
>>> cupy.show_config()
OS : Linux-6.2.0-26-generic-x86_64-with-glibc2.17
Python Version : 3.8.18
CuPy Version : 10.4.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.24.3
SciPy Version : None
Cython Build Version : 0.29.28
Cython Runtime Version : None
CUDA Root : /root/miniconda3/envs/env1
nvcc PATH : None
CUDA Build Version : 11020
CUDA Driver Version : 12020
CUDA Runtime Version : 11080
cuBLAS Version : (available)
cuFFT Version : 10900
cuRAND Version : 10300
cuSOLVER Version : (11, 4, 1)
cuSPARSE Version : (available)
NVRTC Version : (11, 8)
Thrust Version : 101000
CUB Build Version : 101000
Jitify Build Version : 70f5331
cuDNN Build Version : 8201
cuDNN Version : 8700
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : 100
Device 0 Name : NVIDIA GeForce RTX 4090
Device 0 Compute Capability : 89
Device 0 PCI Bus ID : 0000:64:00.0
A few thoughts:
- The
nvcc PATHisNonebut it points to/.../nvccfor me. - The
CUDA Build Versionis lower than theCUDA Runtime Versionbut it is newer for me. - The
cupy.cuda.compile_with_cachemay pick up a previously compiled version that may not match your system.
To resolve (3), you could try to delete the cache (should be in ${HOME}/.cupy/kernel_cache) and run it again.
Closing due to inactivity, please don't hesitate to reopen should the issue persists. :+1: