nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=9165 /var/lib/docker/overlay2/2a1a1c3555109e20c5ba2e386cc3ce69cbb80c3850663c1909db8c46ed565c0c/merged]\\\\nnvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/2a1a1c3555109e20c5ba2e386cc3ce69cbb80c3850663c1909db8c46ed565c0c/merged/usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18: file exists\\\\n\\\"\"": unknown

Open deepxiaobai opened this issue 4 years ago • 14 comments

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=9165 /var/lib/docker/overlay2/2a1a1c3555109e20c5ba2e386cc3ce69cbb80c3850663c1909db8c46ed565c0c/merged]\\nnvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/2a1a1c3555109e20c5ba2e386cc3ce69cbb80c3850663c1909db8c46ed565c0c/merged/usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18: file exists\\n\""": unknown

deepxiaobai avatar Apr 02 '21 02:04 deepxiaobai

This error is occurred when I run the "docker run --runtime=nvidia -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /home/$USER/triton_blog/:/workspace/triton_blog nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3" command.

Environment: Device: Jetson Xavier NX CUDA_VERSION: 10.2 DeepStream5.1

Docker_Version: 19.03.6

deepxiaobai avatar Apr 02 '21 03:04 deepxiaobai

same error

cvtutorials avatar May 18 '21 02:05 cvtutorials

same error

cvtutorials avatar May 18 '21 02:05 cvtutorials

@deepxiaobai @sjtumelc which version of the the NVIDIA container toolkit components are you using?

elezar avatar May 18 '21 07:05 elezar

Same error I built a new image on a Jetson AGX, based from nvcr.io/nvidia/l4t-base:r32.5.0. The new image work fine on the Jetson AGX but i get the error when i want run it on a Jetson NX

codegastudio avatar Jun 14 '21 21:06 codegastudio

Same error while trying to run deepstream with fatbinaryloader in nvcr.io/nvidia/deepstream-l4t:5.1-21.02-base.

ChickenBites avatar Jun 29 '21 11:06 ChickenBites

Looking at the contents of the image: nvcr.io/nvidia/deepstream-l4t:5.1-21.02-base:

ls -alt /usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18
-rw-r--r-- 1 root root 0 Feb 25 00:18 /usr/lib/aarch64-linux-gnu/libnvidia-fatbinaryloader.so.440.18

It contains a zero-sized file matching the name of the file that is being mounted from the host. This could indicate that there may have been an issue with building the container image.

elezar avatar Jun 29 '21 11:06 elezar

@elezar Ive actually linked libnvidia-fatbinaryloader.so.32.4.4 to libnvidia-fatbinaryloader.so.440.18 using symlink when i build the dockerfile, but now the pipeline won't load. Ive also tried the following images: nvcr.io/nvidia/deepstream-l4t:5.1-21.02-samples nvcr.io/nvidia/deepstream-l4t:5.1-21.02-iot same thing.

ChickenBites avatar Jun 29 '21 13:06 ChickenBites

Could you show how the symlinks have been set up?

elezar avatar Jun 29 '21 13:06 elezar

@elezar The directive is:

WORKDIR /usr/lib/aarch64-linux-gnu

RUN rm -f libnvidia-fatbinaryloader.so.440.18
&& ln -s libnvidia-fatbinaryloader.so.32.4.4 libnvidia-fatbinaryloader.so.440.18

ChickenBites avatar Jun 29 '21 13:06 ChickenBites

Same error on the Nano.

hoonkai avatar Sep 08 '21 06:09 hoonkai

Same error here on AGX Xavier. The strange thing is that other docker images work normally using nvidia-container.

gustavojoseleite avatar Oct 23 '21 10:10 gustavojoseleite

same error

drinktee avatar Dec 23 '21 08:12 drinktee

same error.

Solved pulling image nvcr.io/nvidia/l4t-tensorflow:r32.5.0-tf1.15-py3

docker pull nvcr.io/nvidia/l4t-tensorflow:r32.5.0-tf1.15-py3

vertcli avatar Feb 15 '22 10:02 vertcli