vision icon indicating copy to clipboard operation
vision copied to clipboard

Nvidia Jetson Xavier - fails to load image Python extension and Couldn't load custom C++ ops when drawing bounding boxes

Open 4davo opened this issue 3 years ago • 1 comments

🐛 Describe the bug

Note: This was posted to the PyTorch repo as issue # 80576

Errors when running approved combinations of: A) Pytorch 11.0 and Torchvision 0.12.0 and B) Pytorch 12.0 and Torchvision 0.13.0

Case A environment and run results:


davo@ubuntu:~$ python3 collect_env.py Collecting environment information... PyTorch version: 1.11.0 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect

davo@ubuntu:~/yolov5$ python3 detect.py --source 0 '''/home/davo/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}")''' detect: weights=yolov5s.pt, source=0, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False YOLOv5 rocket v6.1-258-g1156a32 Python-3.8.10 torch-1.11.0 CUDA:0 (Xavier, 31011MiB)

Fusing layers... YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients 1/1: 0... Success (inf frames 640x480 at 30.00 FPS)

0: 480x640 Done. (4.190s) 0: 480x640 Done. (0.041s) 0: 480x640 Done. (0.041s) 0: 480x640 Done. (0.045s)

'''Traceback (most recent call last): File "detect.py", line 252, in main(opt) File "detect.py", line 247, in main run(**vars(opt)) File "/home/davo/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "detect.py", line 127, in run pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) File "/home/davo/yolov5/utils/general.py", line 859, in non_max_suppression i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS File "/home/davo/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 39, in nms _assert_has_ops() File "/home/davo/.local/lib/python3.8/site-packages/torchvision/extension.py", line 33, in _assert_has_ops raise RuntimeError( RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install. terminate called without an active exception Aborted (core dumped)'''



Case B environment and run results:


davo@ubuntu:~/yolov5$ python3 ../collect_env.py Collecting environment information... PyTorch version: 1.12.0a0+2c916ef.nv22.3 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.12.0a0+2c916ef.nv22.3 [pip3] torchvision==0.13.0 [conda] Could not collect

davo@ubuntu:~/yolov5$ python3 detect.py --source 0 '''/home/davo/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}")''' detect: weights=yolov5s.pt, source=0, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False YOLOv5 rocket v6.1-258-g1156a32 Python-3.8.10 torch-1.12.0a0+2c916ef.nv22.3 CUDA:0 (Xavier, 31011MiB)

Fusing layers... YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients 1/1: 0... Success (inf frames 640x480 at 30.00 FPS)

0: 480x640 Done. (4.156s) 0: 480x640 Done. (0.043s) 0: 480x640 Done. (0.050s) 0: 480x640 Done. (0.045s)

'''Traceback (most recent call last): File "detect.py", line 252, in main(opt) File "detect.py", linIncluded above - added here as well... Case A environment

davo@ubuntu:~$ python3 collect_env.py Collecting environment information... PyTorch version: 1.11.0 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect

Case B environment

davo@ubuntu:~/yolov5$ python3 ../collect_env.py Collecting environment information... PyTorch version: 1.12.0a0+2c916ef.nv22.3 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.12.0a0+2c916ef.nv22.3 [pip3] torchvision==0.13.0 [conda] Could not collect cc @fmassa @vfdev-5 @pmeier e 247, in main run(**vars(opt)) File "/home/davo/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "detect.py", line 127, in run pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det) File "/home/davo/yolov5/utils/general.py", line 859, in non_max_suppression i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS File "/home/davo/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40, in nms _assert_has_ops() File "/home/davo/.local/lib/python3.8/site-packages/torchvision/extension.py", line 33, in _assert_has_ops raise RuntimeError( RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install. terminate called without an active exception Aborted (core dumped)'''

Versions

Included above - added here as well... Case A environment

davo@ubuntu:~$ python3 collect_env.py Collecting environment information... PyTorch version: 1.11.0 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.11.0 [pip3] torchvision==0.12.0 [conda] Could not collect

Case B environment

davo@ubuntu:~/yolov5$ python3 ../collect_env.py Collecting environment information... PyTorch version: 1.12.0a0+2c916ef.nv22.3 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.65-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.239 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.12.0a0+2c916ef.nv22.3 [pip3] torchvision==0.13.0 [conda] Could not collect

cc @fmassa @vfdev-5 @pmeier

4davo avatar Jul 02 '22 17:07 4davo

Are torch and torchvision installed via pip or building from source ? I believe the latter shouldn't give these issues, but it takes quite a lot of time. I've written a small guide about this, hope it helps.

vballoli avatar Aug 04 '22 06:08 vballoli

I have had the same problem, recently. Have you solved that problems? Could you help me?

HengZhu96 avatar Aug 14 '23 09:08 HengZhu96