jetson-inference icon indicating copy to clipboard operation
jetson-inference copied to clipboard

PyTorch can be imported from the host but unable to load in jetson-inference container

Open shi093 opened this issue 2 years ago • 9 comments

Hi, dusty,

My Jetson Xavier automatically updated software (the routine type of automatic update), then "import torch" gives error message in jetson-inference container, but "import torch" works well from the host. I checked your other replies related to this error message and tried all sorts of fixes, but still not working. The error message is:

import torch Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 188, in _load_global_deps() File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 141, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: libcurand.so.10: cannot open shared object file: No such file or directory

when I check from the container, I got this: root@desktop:/jetson-inference/python/training/detection/ssd# ls /usr/local/cuda/lib64 libcudadevrt.a libcudart_static.a stubs

But all the required files are there on the host directory of /usr/local/cuda/lib64

How do I fix this? Thank you very much for the help!

shi093 avatar Apr 18 '22 01:04 shi093

Hi @shi093, what's the current version of your JetPack-L4T? You can check it with cat /etc/nv_tegra_release

I would recommend pulling the latest jetson-inference from master, as that should help you run the latest container.

dusty-nv avatar Apr 18 '22 13:04 dusty-nv

dusty, Here is the info about my JetPack-L4T:

R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020

I did pull the latest jetson-inference by running git clone --recursive https://github.com/dusty-nv/jetson-inference

But I still have the same problem.

shi093 avatar Apr 18 '22 15:04 shi093

Also, I have Pytorch 1.7 installed on my Jetson Xavier, but jetson-inference docker has Pytorch version 1.6, is this a problem? Thanks again.

shi093 avatar Apr 18 '22 16:04 shi093

Also, I have Pytorch 1.7 installed on my Jetson Xavier, but jetson-inference docker has Pytorch version 1.6, is this a problem? Thanks again.

That shouldn't be an issue or make a difference

dusty-nv avatar Apr 18 '22 17:04 dusty-nv

root@desktop:/jetson-inference/python/training/detection/ssd# ls /usr/local/cuda/lib64 libcudadevrt.a libcudart_static.a stubs

But all the required files are there on the host directory of /usr/local/cuda/lib64

OK, so you have files in /usr/local/cuda/lib64 which are not appearing inside the container?

It seems since your system update, not all the CUDA libraries are getting properly mounted anymore

dusty-nv avatar Apr 18 '22 17:04 dusty-nv

"It seems since your system update, not all the CUDA libraries are getting properly mounted anymore" yes, I think you are right. But is there a way to fix this? I tried docker/run.sh --volume /usr/local/cuda But didn't change anything

shi093 avatar Apr 18 '22 17:04 shi093

Can you try uninstalling the nvidia-container runtime and re-installing?

You should be able to find the relevant apt packages with apt-cache search nvidia-container

dusty-nv avatar Apr 18 '22 17:04 dusty-nv

I run "apt-cache search nvidia-container" and here are what returned: libnvidia-container-dev - NVIDIA container runtime library (development files) libnvidia-container-tools - NVIDIA container runtime library (command-line tools) libnvidia-container0 - NVIDIA container runtime library libnvidia-container1-dbg - NVIDIA container runtime library (debugging symbols) libnvidia-container1 - NVIDIA container runtime library nvidia-container-runtime - NVIDIA container runtime nvidia-container-toolkit - NVIDIA container runtime hook nvidia-container-csv-cuda - Jetpack CUDA CSV file nvidia-container-csv-cudnn - Jetpack CUDNN CSV file nvidia-container-csv-tensorrt - Jetpack TensorRT CSV file nvidia-container-csv-visionworks - Jetpack VisionWorks CSV file nvidia-container - NVIDIA Container Meta Package

How do I uninstall the runtime and reinstall again? Thanks again.

shi093 avatar Apr 18 '22 20:04 shi093

Use sudo apt-get remove to remove them (and specify the packages), then sudo apt-get install to install them again.

If this still doesn't work, I'm afraid that I would recommend re-flashing your device or SD card to get it back into a known working state.

dusty-nv avatar Apr 18 '22 20:04 dusty-nv