jetson-containers OSError: libcurand.so.10: cannot open shared object file: No such file or directory

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

Open SkalskiP opened this issue 2 years ago • 6 comments

reproduction path

Run docker container

docker run -it --rm --net=host --runtime nvidia nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.8-py3

Run python3 session and import torch

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 195, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 148, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

nvidia-jetpack specification

Package: nvidia-jetpack
Version: 5.0.1-b118
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-cuda (= 5.0.1-b118), nvidia-opencv (= 5.0.1-b118), nvidia-cudnn8 (= 5.0.1-b118), nvidia-tensorrt (= 5.0.1-b118), nvidia-container (= 5.0.1-b118), nvidia-vpi (= 5.0.1-b118), nvidia-nsight-sys (= 5.0.1-b118), nvidia-l4t-jetson-multimedia-api (>> 34.1-0), nvidia-l4t-jetson-multimedia-api (<< 34.2-0)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.0.1-b118_arm64.deb
Size: 29376
SHA256: d7ff0e4a95cc11c7a5d0b9e347923e8233ab544431d5db49d18c24944902e7a2
SHA1: fcab6ba9d6dca4a8b3e758d6fb1584baed34f7ed
MD5sum: f168d009bf5e3ee36ab14e646ad4b7dc
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack
Version: 5.0-b114
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-cuda (= 5.0-b114), nvidia-opencv (= 5.0-b114), nvidia-cudnn8 (= 5.0-b114), nvidia-tensorrt (= 5.0-b114), nvidia-container (= 5.0-b114), nvidia-vpi (= 5.0-b114), nvidia-nsight-sys (= 5.0-b114), nvidia-l4t-jetson-multimedia-api (>> 34.1-0), nvidia-l4t-jetson-multimedia-api (<< 34.2-0)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.0-b114_arm64.deb
Size: 29370
SHA256: 3b5c14e3ed53cd2517d1a318d056aad3d8b44ff660a489a9b62825d518cf7c5b
SHA1: 608d1f78791a2bdda8bf88443796dfe99f19b199
MD5sum: dbcb9ff116c50b66d5270acd95e05f9a
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

additional information

/usr/local/cuda/lib64/ does not contain library files.

root@ubuntu:/# ls /usr/local/cuda/lib64/
libcudadevrt.a  libcudart_static.a  stubs

Default runtime is set to nvidia

ubuntu@ubuntu:~$ docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 53
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: 
 runc version: 7cfd3bd
 init version: 
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.65-tegra
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 14.56GiB
 Name: ubuntu
 ID: TSUV:CCRX:H2ZP:OR7L:E4SU:KG5S:RTJS:63BA:6UJB:DPKB:7EMK:CBV6
 Docker Root Dir: /mnt/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Jun 19 '22 05:06 SkalskiP

Hi @SkalskiP , it seems you are using the new jetpack, so you should use the docker images for the new L4T too. Try this image: nvcr.io/nvidia/l4t-pytorch:r34.1.0-pth1.12-py3

Jun 21 '22 14:06 CourchesneA

Hi @CourchesneA the new one works. Doesn't it kind of defeat Docker's purpose? I would like to be able to run my container on different hosts, regardless of the OS that they are running. It is impossible with the new JetPack?

Jun 21 '22 15:06 SkalskiP

Well from what I understand, we are not exactly there yet for nvidia-docker. Specifically, CUDA was usually mounted from the host into the container, but for the jetson compatibility between different versions of CUDA in host and container was a problem. For the new L4T containers, CUDA is no longer mounted from host, it is contained in the images (hence the images are biggers). While this will solve some compatibility issues and restriction between host / container, I think this explains why jetpack 4.5 hosts are not compatible with jetpack 5 container and vice-versa.

Jun 21 '22 15:06 CourchesneA

Hi @CourchesneA, @SkalskiP, yes JetPack 5.x has migrated to having CUDA/cuDNN/TensorRT/ect installed into the container, so they are more portable. For example, you can run container images built for both JetPack 5.0 and 5.0.1 on JetPack 5.0.1 without needing to rebuild them.

As @CourchesneA, JetPack 4.x container images are not compatible with JetPack 5.x and would need re-built.

Jun 21 '22 15:06 dusty-nv

Hi, @dusty-nv! 👋 Hm... The main problem that I have is that I actually build my own custom docker image and it threw the same error. I think that rebuild of the image on the new host does not solve the issue.

Jun 21 '22 16:06 SkalskiP

In that case, are you sure the PyTorch wheel that is being used in the container is also compatible with your version of JetPack?

The wheels for JetPack 5.x are here:

https://elinux.org/Jetson_Zoo#PyTorch_.28Caffe2.29
https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-11-now-available/72048

Jun 21 '22 17:06 dusty-nv

jetson-containers jetson-containers copied to clipboard

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

reproduction path

nvidia-jetpack specification

additional information

jetson-containers
jetson-containers copied to clipboard