Selective-Stereo icon indicating copy to clipboard operation
Selective-Stereo copied to clipboard

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Open MingshuaiZhao opened this issue 9 months ago • 1 comments

在IGEV模型上进行测试的时候,出现下面的问题: File "save_disp.py", line 90, in demo(args) File "save_disp.py", line 50, in demo disp = model(image1, image2, iters=args.valid_iters, test_mode=True) File "/home/liuyihao/anaconda3/envs/igev/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/liuyihao/mdec2025/Selective-Stereo/Selective-IGEV/core/igev_stereo.py", line 200, in forward geo_feat = geo_fn(disp, coords) File "/home/liuyihao/mdec2025/Selective-Stereo/Selective-IGEV/core/geometry.py", line 46, in call geo_volume = bilinear_sampler(geo_volume, disp_lvl) File "/home/liuyihao/mdec2025/Selective-Stereo/Selective-IGEV/core/utils/utils.py", line 72, in bilinear_sampler img = F.grid_sample(img, grid, align_corners=True) File "/home/liuyihao/anaconda3/envs/igev/lib/python3.8/site-packages/torch/nn/functional.py", line 4223, in grid_sample return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

环境如下: _libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi ca-certificates 2025.2.25 h06a4308_0
cachetools 5.5.2 pypi_0 pypi certifi 2025.1.31 pypi_0 pypi charset-normalizer 3.4.1 pypi_0 pypi contourpy 1.1.1 pypi_0 pypi cycler 0.12.1 pypi_0 pypi fonttools 4.56.0 pypi_0 pypi google-auth 2.38.0 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi grpcio 1.70.0 pypi_0 pypi idna 3.10 pypi_0 pypi imageio 2.35.1 pypi_0 pypi importlib-metadata 8.5.0 pypi_0 pypi importlib-resources 6.4.5 pypi_0 pypi kiwisolver 1.4.7 pypi_0 pypi lazy-loader 0.4 pypi_0 pypi ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
markdown 3.7 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi matplotlib 3.7.5 pypi_0 pypi ncurses 6.4 h6a678d5_0
networkx 3.1 pypi_0 pypi numpy 1.24.4 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi opencv-python 4.11.0.86 pypi_0 pypi openssl 3.0.16 h5eee18b_0
opt-einsum 3.4.0 pypi_0 pypi packaging 24.2 pypi_0 pypi pillow 10.4.0 pypi_0 pypi pip 24.2 py38h06a4308_0
protobuf 5.29.3 pypi_0 pypi pyasn1 0.6.1 pypi_0 pypi pyasn1-modules 0.4.1 pypi_0 pypi pyparsing 3.1.4 pypi_0 pypi python 3.8.20 he870216_0
python-dateutil 2.9.0.post0 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi readline 8.2 h5eee18b_0
requests 2.32.3 pypi_0 pypi requests-oauthlib 2.0.0 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-image 0.21.0 pypi_0 pypi scipy 1.10.1 pypi_0 pypi setuptools 75.1.0 py38h06a4308_0
six 1.17.0 pypi_0 pypi sqlite 3.45.3 h5eee18b_0
tensorboard 2.14.0 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi tifffile 2023.7.10 pypi_0 pypi timm 0.5.4 pypi_0 pypi tk 8.6.14 h39e8969_0
torch 1.12.1+cu113 pypi_0 pypi torchaudio 0.12.1+cu113 pypi_0 pypi torchvision 0.13.1+cu113 pypi_0 pypi tqdm 4.67.1 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi urllib3 2.2.3 pypi_0 pypi werkzeug 3.0.6 pypi_0 pypi wheel 0.44.0 py38h06a4308_0
xz 5.6.4 h5eee18b_1
zipp 3.20.2 pypi_0 pypi zlib 1.2.13 h5eee18b_1

MingshuaiZhao avatar Mar 11 '25 02:03 MingshuaiZhao

说一下你用的ckpt和你运行save_disp.py时的参数

目前我认为可能是tensor不连续造成的,你可以在报错函数前用contiguous()把输入tensor变成连续的

其次可考虑是不是你用了bfloat16,cuDNN可能对这个支持不够好

最后再考虑cuDNN的版本问题

Windsrain avatar Mar 11 '25 06:03 Windsrain