nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11 ---------------error in docker image execution

Open DSRajesh opened this issue 2 years ago • 4 comments

I have created a docker image and getting the following error while running on a aws machine.

Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!

As a solution I had inserted the following line in the docker file: ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8

But am still getting the same error. If anyone could help me it would be great.

DSRajesh avatar Jun 01 '22 05:06 DSRajesh

@DSRajesh could you give more details on image that you are using? Does libcudnn_ops_infer.so.8 exist in the image?

elezar avatar Jun 01 '22 07:06 elezar

FROM nvidia/cuda:11.0-base COPY . /buoyancy RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys RUN :
&& apt-get update
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends
software-properties-common
&& add-apt-repository -y ppa:deadsnakes
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends
python3.8-venv
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
&& :
RUN python3.8 -m venv /venv ENV PATH=/venv/bin:$PATH RUN apt-get install -y python3-distutils python3-apt RUN apt install -y software-properties-common RUN apt update RUN pip install torch==1.12.0.dev20220326+cu113 -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html RUN pip install torchvision==0.13.0.dev20220326+cu113 -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html RUN pip install torchmetrics==0.7.0 pytorch-lightning==1.4.2 ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8

DSRajesh avatar Jun 01 '22 09:06 DSRajesh

Also I have verified the existence of "/libcudnn_ops_infer.so.8" path mentioned above, using the ubuntu "locate" command

DSRajesh avatar Jun 01 '22 09:06 DSRajesh

@DSRajesh should the LD_LIBRARY_PATH not be: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/

Also, I note that the dockerfile uses nvidia/cuda:11.0-base as the base image. It is my understanding that the images ending in -base are deprecated. Could you repeat the tests using nvidia/11.3.1-base-ubuntu20.04 instead (note that the distribution is explicitly specified)?

elezar avatar Jun 01 '22 09:06 elezar