jetson-inference
jetson-inference copied to clipboard
PyTorch can be imported from the host but unable to load in jetson-inference container
Hi, dusty,
My Jetson Xavier automatically updated software (the routine type of automatic update), then "import torch" gives error message in jetson-inference container, but "import torch" works well from the host. I checked your other replies related to this error message and tried all sorts of fixes, but still not working. The error message is:
import torch Traceback (most recent call last): File "
", line 1, in File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 188, in _load_global_deps() File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 141, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: libcurand.so.10: cannot open shared object file: No such file or directory
when I check from the container, I got this: root@desktop:/jetson-inference/python/training/detection/ssd# ls /usr/local/cuda/lib64 libcudadevrt.a libcudart_static.a stubs
But all the required files are there on the host directory of /usr/local/cuda/lib64
How do I fix this? Thank you very much for the help!
Hi @shi093, what's the current version of your JetPack-L4T? You can check it with cat /etc/nv_tegra_release
I would recommend pulling the latest jetson-inference from master, as that should help you run the latest container.
dusty, Here is the info about my JetPack-L4T:
R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020
I did pull the latest jetson-inference by running git clone --recursive https://github.com/dusty-nv/jetson-inference
But I still have the same problem.
Also, I have Pytorch 1.7 installed on my Jetson Xavier, but jetson-inference docker has Pytorch version 1.6, is this a problem? Thanks again.
Also, I have Pytorch 1.7 installed on my Jetson Xavier, but jetson-inference docker has Pytorch version 1.6, is this a problem? Thanks again.
That shouldn't be an issue or make a difference
root@desktop:/jetson-inference/python/training/detection/ssd# ls /usr/local/cuda/lib64 libcudadevrt.a libcudart_static.a stubs
But all the required files are there on the host directory of /usr/local/cuda/lib64
OK, so you have files in /usr/local/cuda/lib64
which are not appearing inside the container?
It seems since your system update, not all the CUDA libraries are getting properly mounted anymore
"It seems since your system update, not all the CUDA libraries are getting properly mounted anymore" yes, I think you are right. But is there a way to fix this? I tried docker/run.sh --volume /usr/local/cuda But didn't change anything
Can you try uninstalling the nvidia-container runtime and re-installing?
You should be able to find the relevant apt packages with apt-cache search nvidia-container
I run "apt-cache search nvidia-container" and here are what returned: libnvidia-container-dev - NVIDIA container runtime library (development files) libnvidia-container-tools - NVIDIA container runtime library (command-line tools) libnvidia-container0 - NVIDIA container runtime library libnvidia-container1-dbg - NVIDIA container runtime library (debugging symbols) libnvidia-container1 - NVIDIA container runtime library nvidia-container-runtime - NVIDIA container runtime nvidia-container-toolkit - NVIDIA container runtime hook nvidia-container-csv-cuda - Jetpack CUDA CSV file nvidia-container-csv-cudnn - Jetpack CUDNN CSV file nvidia-container-csv-tensorrt - Jetpack TensorRT CSV file nvidia-container-csv-visionworks - Jetpack VisionWorks CSV file nvidia-container - NVIDIA Container Meta Package
How do I uninstall the runtime and reinstall again? Thanks again.
Use sudo apt-get remove
to remove them (and specify the packages), then sudo apt-get install
to install them again.
If this still doesn't work, I'm afraid that I would recommend re-flashing your device or SD card to get it back into a known working state.