PaddleDetection icon indicating copy to clipboard operation
PaddleDetection copied to clipboard

RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion. [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)

Open yamashin0922 opened this issue 2 years ago • 18 comments

问题确认 Search before asking

  • [X] 我已经查询历史issue,没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Installation

Bug描述 Describe the Bug

The error message is shown when I try an inference with gpu using the configuration file in examples.

Here is all logs.

root@xxx:/PaddleDetection# python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_mot.yml --video_file=test.mp4 --device=gpu
/PaddleDetection/deploy/pipeline/pipeline.py:24: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  from collections import Sequence, defaultdict
-----------  Running Arguments -----------
MOT:
  batch_size: 1
  enable: true
  model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
  tracker_config: deploy/pipeline/config/tracker_config.yml
crop_thresh: 0.5
visual: true
warmup_frame: 50

------------------------------------------
Multi-Object Tracking enabled
100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 186349/186349 [02:09<00:00, 1441.91KB/s]
MOT  model dir:  /root/.cache/paddle/infer_weights/mot_ppyoloe_l_36e_pipeline
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order:
--transform op: Resize
--transform op: Permute
--------------------------------------------
video fps: 30, frame_count: 19854
Thread: 0; frame id: 0
Traceback (most recent call last):
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 1103, in <module>
    main()
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 1090, in main
    pipeline.run_multithreads()
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 170, in run_multithreads
    self.predictor.run(self.input)
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 488, in run
    self.predict_video(input, thread_idx=thread_idx)
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 668, in predict_video
    res = self.mot_predictor.predict_image(
  File "/PaddleDetection/deploy/pptracking/python/mot_sde_infer.py", line 478, in predict_image
    inputs = self.preprocess(batch_image_list)
  File "/PaddleDetection/deploy/pptracking/python/det_infer.py", line 140, in preprocess
    input_tensor.copy_from_cpu(inputs[input_names[i]])
  File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/paddle/fluid/inference/wrapper.py", line 38, in tensor_copy_from_cpu
    self.copy_from_cpu_bind(data)
RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
  [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)

Could you help me to fix this?

It works when remove "--device=gpu" from command line.

复现环境 Environment

OS

root@xxx:/PaddleDetection# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

GPU and drivers

root@xxx:/PaddleDetection# nvidia-smi
Wed Jan 18 06:20:36 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:00:10.0 Off |                    0 |
| N/A   29C    P0    43W / 300W |      4MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvcc

root@xxx:/PaddleDetection# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

python

root@xxx:/PaddleDetection# python -V
Python 3.9.16

packages for python

root@xxx:/PaddleDetection# pip list
Package            Version
------------------ -------------
aiofiles           22.1.0
aiohttp            3.8.3
aiosignal          1.3.1
altair             4.2.0
anyio              3.6.2
astor              0.8.1
async-timeout      4.0.2
attrdict           2.0.1
attrs              22.2.0
Babel              2.11.0
bce-python-sdk     0.8.74
Brotli             1.0.9
certifi            2022.12.7
charset-normalizer 2.1.1
click              8.1.3
contourpy          1.0.7
cycler             0.11.0
Cython             0.29.33
decorator          5.1.1
dill               0.3.6
entrypoints        0.4
fastapi            0.89.1
ffmpy              0.3.0
filterpy           1.4.5
Flask              2.2.2
flask-babel        3.0.0
fonttools          4.38.0
frozenlist         1.3.3
fsspec             2022.11.0
future             0.18.3
gevent             22.10.2
geventhttpclient   2.0.8
gradio             3.16.2
greenlet           2.0.1
grpcio             1.41.0
h11                0.14.0
httpcore           0.16.3
httpx              0.23.3
idna               3.4
importlib-metadata 6.0.0
itsdangerous       2.1.2
Jinja2             3.1.2
joblib             1.2.0
jsonschema         4.17.3
kiwisolver         1.4.4
lap                0.4.0
linkify-it-py      1.0.3
markdown-it-py     2.1.0
MarkupSafe         2.1.2
matplotlib         3.6.3
mdit-py-plugins    0.3.3
mdurl              0.1.2
motmetrics         1.4.0
mpmath             1.2.1
multidict          6.0.4
multiprocess       0.70.14
numpy              1.24.1
onnx               1.12.0
opencv-python      4.5.5.64
opt-einsum         3.3.0
orjson             3.8.5
packaging          23.0
paddle-bfloat      0.1.7
paddlepaddle-gpu   2.4.1.post117
pandas             1.5.2
Pillow             9.4.0
pip                22.3.1
protobuf           3.20.0
psutil             5.9.4
pyclipper          1.3.0.post4
pycocotools        2.0.6
pycryptodome       3.16.0
pydantic           1.10.4
pydub              0.25.1
pyparsing          3.0.9
pyrsistent         0.19.3
python-dateutil    2.8.2
python-multipart   0.0.5
python-rapidjson   1.9
pytz               2022.7.1
PyYAML             6.0
rarfile            4.0
requests           2.28.2
rfc3986            1.5.0

Bug描述确认 Bug description confirmation

  • [X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR? Are you willing to submit a PR?

  • [X] 我愿意提交PR!I'd like to help by submitting a PR!

yamashin0922 avatar Jan 18 '23 06:01 yamashin0922

According to the log message, this error may be caused by that the cudnn path is not set correctly or cudnn is not installed in your system

wangxinxin08 avatar Jan 18 '23 08:01 wangxinxin08

Thank you for your prompt reply. I'll try to install cudnn property version and will feedback to you.

yamashin0922 avatar Jan 19 '23 09:01 yamashin0922

hello did you solve the problem

mahachaaben99 avatar Mar 17 '23 12:03 mahachaaben99

i am having the same issue

mahachaaben99 avatar Mar 17 '23 12:03 mahachaaben99

The first step is to check whether there are libcudnn.so and libcublas.so in the shared library. Enter below command in the terminal.

ls /usr/lib | grep lib

If you don't have libcudnn.so and libcublas.so files, you need to find their location by below command.

locate libcudnn.so locate libcublas.so

In my case, libcudnn.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 And libcublas.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1

Once you locate them, you need to add them into the shared library by following the steps below.

Enter usr/lib folder cd /usr/lib

Create libcudnn.so and libcublas.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so

Now, check whether they are added to the shared library, ls /usr/lib | grep lib

If you can find libcudnn.so and libcublas.so with the above command, you wouldn't be having the issue.

Gokulnath-V avatar Jun 07 '23 08:06 Gokulnath-V

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

bamboosteam avatar Jun 23 '23 01:06 bamboosteam

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

sjtugzx avatar Jul 10 '23 08:07 sjtugzx

same error

ZSitong avatar Aug 09 '23 10:08 ZSitong

Same error. Do you have any solution about this? Thanks.

Zhouziyuya avatar Aug 11 '23 09:08 Zhouziyuya

Use Cuda 11 instead of 12

We only release paddlepaddle-gpu cuda10.2 on pypi. If you want to install paddlepaddle-gpu with cuda version of 10.2/11.2/11.6/11.7, commands to install are on our website

https://pypi.org/project/paddlepaddle-gpu/

be42day avatar Aug 21 '23 20:08 be42day

Updated link to install paddlepaddle-gpu for CUDA 10.2/11.2/11.6/11.7.

Installation documentation

s4lm-xi avatar Nov 27 '23 21:11 s4lm-xi

The first step is to check whether there are libcudnn.so and libcublas.so in the shared library. Enter below command in the terminal.

ls /usr/lib | grep lib

If you don't have libcudnn.so and libcublas.so files, you need to find their location by below command.

locate libcudnn.so locate libcublas.so

In my case, libcudnn.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 And libcublas.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1

Once you locate them, you need to add them into the shared library by following the steps below.

Enter usr/lib folder cd /usr/lib

Create libcudnn.so and libcublas.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so

Now, check whether they are added to the shared library, ls /usr/lib | grep lib

If you can find libcudnn.so and libcublas.so with the above command, you wouldn't be having the issue.

The version of cuda I have installed is12.3,Following this version, I downloaded the cudnn package and extracted it to /usr/local/cuda/include和/usr/local/cuda/lib64 cd usr/lib sudo ln -s /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so A new error has occurred: Could not load library libcudnn_ops_infer.so.8libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory C++ Traceback(most recent call last): 0315114205

ye7love7 avatar Mar 15 '24 03:03 ye7love7

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

JoshC8C7 avatar Apr 04 '24 11:04 JoshC8C7

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

lokeish avatar Apr 16 '24 12:04 lokeish

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

same as you, tried the @JoshC8C7 's tips. but still not work.

plmsmile avatar Apr 19 '24 06:04 plmsmile

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

My problem is resolved, i have installed cudnn properly, i am using paddlepaddle-gup==2.6.0. My cuda version is 12.2, nvidia-driver version is 535.

lokeish avatar Apr 21 '24 12:04 lokeish