StyleFlow icon indicating copy to clipboard operation
StyleFlow copied to clipboard

After docker build, `docker-compose up` fails with `Found no NVIDIA driver on your system.`

Open jojkaart opened this issue 4 years ago • 3 comments
trafficstars

I installed nvidia-docker as instructed in the linked repo for ubuntu 20.04 and the test for that seems to indicate I have everything in order:

joel@suina:~/Source/StyleFlow$ docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
Sat Jan  9 18:39:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0  On |                  N/A |
| 30%   30C    P8     7W /  75W |    571MiB /  3910MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

However, when I try to start StyleFlow with docker-compose up, I get the following output and no GUI.

oel@suina:~/Source/StyleFlow$ docker-compose up
Starting styleflow_interface_1 ... done
Attaching to styleflow_interface_1

... (Edit3: removed a lot of QT debug from here, it turned out to be irrelevant) ...

interface_1  | ----------------- Options ---------------
interface_1  |                 batchSize: 1                             
interface_1  |           checkpoints_dir: ./checkpoints                 
interface_1  |                  dataroot: ./data/datasetX               
interface_1  |                   gpu_ids: 0                             
interface_1  |      max_result_snapshots: 30                            
interface_1  |                     model: xxxx                          
interface_1  |                      name: XXXX                          
interface_1  |               network_pkl: gdrive:networks/stylegan2-ffhq-config-f.pkl
interface_1  |             only_for_test: ...                           
interface_1  |                     phase: test                          
interface_1  | ----------------- End -------------------
interface_1  | Traceback (most recent call last):
interface_1  |   File "/usr/app/main.py", line 365, in <module>
interface_1  |     ex = ExWindow(opt)
interface_1  |   File "/usr/app/main.py", line 40, in __init__
interface_1  |     self.EX = Ex(opt)
interface_1  |   File "/usr/app/main.py", line 64, in __init__
interface_1  |     self.zero_padding = torch.zeros(1, 18, 1).cuda()
interface_1  |   File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 162, in _lazy_init
interface_1  |     _check_driver()
interface_1  |   File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 82, in _check_driver
interface_1  |     http://www.nvidia.com/Download/index.aspx""")
interface_1  | AssertionError: 
interface_1  | Found no NVIDIA driver on your system. Please check that you
interface_1  | have an NVIDIA GPU and installed a driver from
interface_1  | http://www.nvidia.com/Download/index.aspx
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqgif.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqicns.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqico.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqjpeg.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqsvg.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqtga.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqtiff.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqwbmp.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqwebp.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so" 
interface_1  | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/platforms/libqxcb.so" 
interface_1  | QLibraryPrivate::unload succeeded on "Xcursor" (faked)
styleflow_interface_1 exited with code 1

Any idea what might be going wrong?

Edit: After figuring out how to get a shell inside the docker-compose container, I managed to get the following information:

root@623a41c9882d:/usr/app# python3.7 -c 'import torch; print(torch.version.cuda)'
9.0.176
root@623a41c9882d:/usr/app# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Perhaps I need to figure out how to get the docker built with CUDA 9 instead of 10.

Edit2: After installing updates for nvidia driver files (I guess they came from the nvidia-docker repo), adding nvidia as the default docker-runtime, rebooting to make docker work again and translating the docker-compose.yml into the following docker command, I managed to get the program to show me a black screen.

docker run --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -e QT_X11_NO_MITSHM=1 -e QT_DEBUG_PLUGINS=1 styleflow_interface python3.7 /usr/app/main.py

With docker-compose up the result is still the same as above. I guess I'll see if upgrading docker-composer helps. Version 1.22.0 is probably rather old.

Edit3: after upgrading docker-composer, docker-compose up now gets the same black window up that I got with the above command. I saw these two lines in the log, though:

interface_1  | libGL error: No matching fbConfigs or visuals found
interface_1  | libGL error: failed to load driver: swrast

So I did some googling with the errors and found that running export LIBGL_ALWAYS_INDIRECT=1 before starting main.py gets rid of those errors, but the result is still a black window that does nothing.

Edit4: Since I've solved the original issue that I made this ticket about. I'll leave it here. Hopefully it's helpful for someone. I'll make a new ticket for the black window issue.

jojkaart avatar Jan 09 '21 18:01 jojkaart

Had the same error, changing the docker-compose.yml to this did the trick for me:

version: "3.3"

services:
  interface:
    build: .
    runtime: nvidia
    command: python3.7 /usr/app/main.py
    volumes: 
      - /tmp/.X11-unix:/tmp/.X11-unix
    environment:
      DISPLAY: $DISPLAY
      QT_X11_NO_MITSHM: 1
      QT_DEBUG_PLUGINS: 1
      NVIDIA_VISIBLE_DEVICES: all

(I don't have a lot of docker experience, don't kill me if this is wrong)

TheUncleBenny avatar Jan 09 '21 21:01 TheUncleBenny

Had the same error, changing the docker-compose.yml to this did the trick for me:

version: "3.3"

services:
  interface:
    build: .
    runtime: nvidia
    command: python3.7 /usr/app/main.py
    volumes: 
      - /tmp/.X11-unix:/tmp/.X11-unix
    environment:
      DISPLAY: $DISPLAY
      QT_X11_NO_MITSHM: 1
      QT_DEBUG_PLUGINS: 1
      NVIDIA_VISIBLE_DEVICES: all

(I don't have a lot of docker experience, don't kill me if this is wrong)

Does the container work with CUDA 11 for you? It doesn't for me :/

rklasen avatar Jan 13 '21 11:01 rklasen

On ubuntu, old version docker-compose didn't support runtime config, if someone like me using old docker and docker-compose version, then just use docker-compose for build:

sudo docker-compose build

run with docker:

sudo docker run --rm --gpus all \
		--env-file .env \
        --volume /tmp/.X11-unix:/tmp/.X11-unix \
        styleflow_interface:latest \
        python3.7 /usr/app/main.py

.env file content is:

DISPLAY=:1
QT_X11_NO_MITSHM=1
QT_DEBUG_PLUGINS=1
NVIDIA_VISIBLE_DEVICES=0

The DISPLAY value ':1' maybe different, get it after command xhost +local:docker by echo $DISPLAY.

ghosthamlet avatar Jan 15 '21 12:01 ghosthamlet