voltaML-fast-stable-diffusion icon indicating copy to clipboard operation
voltaML-fast-stable-diffusion copied to clipboard

Unable to run the voltaml/volta_diffusion:v0.1 docker image

Open wywywywy opened this issue 2 years ago • 5 comments

-> % sudo docker run -it --gpus all voltaml/volta_diffusion:v0.1 bash
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/e049fdb3bc56fecdeefb3b950034cbc757eeb166b152330d00ef6e8a2972af06/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

This is probably because when --gpus=all is specified, the Docker engine will try and mount all the nvidia & cuda bits & pieces into the container. But some of the files in the image (e.g. /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1) are actually links rather than files, so the mounting process is not successful.

Please can you open source the Dockerfile as well.

wywywywy avatar Nov 23 '22 15:11 wywywywy

Same issue here, found an issue related to this on nvidia-docker repo https://github.com/NVIDIA/nvidia-container-toolkit/issues/289

I made a Dockerfile containing this

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

and executed it with docker build -t voltaml/volta_diffusion -f Dockerfile .

And it seems to work

Pop115 avatar Nov 25 '22 09:11 Pop115

Added the Dockerfile. Please check and close the issue if its working.

VoltaML avatar Nov 25 '22 11:11 VoltaML

Tried building with the command by using your Dockerfile docker build -t voltaml/volta_diffusion -f Dockerfile . but got the following error

 => ERROR [6/6] RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt                                                                                                                                                                                                                                             1.6s
------
 > [6/6] RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt:
NVIDIA/nvidia-docker#9 0.729 ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/code/requirements.txt'
------
executor failed running [/bin/sh -c pip install --no-cache-dir --upgrade -r /code/requirements.txt]: exit code: 1

Please add instructions in the readme if the command is not correct

Pop115 avatar Nov 25 '22 11:11 Pop115

Download this file https://gist.github.com/JackCloudman/7143c7aeaafa54ed35b3f6cfe8a30c57

docker build -t voltaml/volta_diffusion:v0.1 -f Dockerfile .
docker run -it --gpus=all -p "8888:8888" voltaml/volta_diffusion:v0.1 jupyter lab --port=8888 --no-browser --ip 0.0.0.0 --allow-root

JackCloudman avatar Nov 25 '22 23:11 JackCloudman

Updated to docker v0.2. Please test

harishprabhala avatar Nov 29 '22 07:11 harishprabhala