nvidia-docker
nvidia-docker copied to clipboard
Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11 ---------------error in docker image execution
I have created a docker image and getting the following error while running on a aws machine.
Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!
As a solution I had inserted the following line in the docker file: ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
But am still getting the same error. If anyone could help me it would be great.
@DSRajesh could you give more details on image that you are using? Does libcudnn_ops_infer.so.8
exist in the image?
FROM nvidia/cuda:11.0-base
COPY . /buoyancy
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys
RUN :
&& apt-get update
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends
software-properties-common
&& add-apt-repository -y ppa:deadsnakes
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends
python3.8-venv
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
&& :
RUN python3.8 -m venv /venv
ENV PATH=/venv/bin:$PATH
RUN apt-get install -y python3-distutils python3-apt
RUN apt install -y software-properties-common
RUN apt update
RUN pip install torch==1.12.0.dev20220326+cu113 -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html
RUN pip install torchvision==0.13.0.dev20220326+cu113 -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html
RUN pip install torchmetrics==0.7.0 pytorch-lightning==1.4.2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
Also I have verified the existence of "/libcudnn_ops_infer.so.8" path mentioned above, using the ubuntu "locate" command
@DSRajesh should the LD_LIBRARY_PATH
not be: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/targets/x86_64-linux/lib/
Also, I note that the dockerfile uses nvidia/cuda:11.0-base
as the base image. It is my understanding that the images ending in -base
are deprecated. Could you repeat the tests using nvidia/11.3.1-base-ubuntu20.04
instead (note that the distribution is explicitly specified)?