StyleFlow
StyleFlow copied to clipboard
After docker build, `docker-compose up` fails with `Found no NVIDIA driver on your system.`
I installed nvidia-docker as instructed in the linked repo for ubuntu 20.04 and the test for that seems to indicate I have everything in order:
joel@suina:~/Source/StyleFlow$ docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
Sat Jan 9 18:39:24 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| 30% 30C P8 7W / 75W | 571MiB / 3910MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when I try to start StyleFlow with docker-compose up, I get the following output and no GUI.
oel@suina:~/Source/StyleFlow$ docker-compose up
Starting styleflow_interface_1 ... done
Attaching to styleflow_interface_1
... (Edit3: removed a lot of QT debug from here, it turned out to be irrelevant) ...
interface_1 | ----------------- Options ---------------
interface_1 | batchSize: 1
interface_1 | checkpoints_dir: ./checkpoints
interface_1 | dataroot: ./data/datasetX
interface_1 | gpu_ids: 0
interface_1 | max_result_snapshots: 30
interface_1 | model: xxxx
interface_1 | name: XXXX
interface_1 | network_pkl: gdrive:networks/stylegan2-ffhq-config-f.pkl
interface_1 | only_for_test: ...
interface_1 | phase: test
interface_1 | ----------------- End -------------------
interface_1 | Traceback (most recent call last):
interface_1 | File "/usr/app/main.py", line 365, in <module>
interface_1 | ex = ExWindow(opt)
interface_1 | File "/usr/app/main.py", line 40, in __init__
interface_1 | self.EX = Ex(opt)
interface_1 | File "/usr/app/main.py", line 64, in __init__
interface_1 | self.zero_padding = torch.zeros(1, 18, 1).cuda()
interface_1 | File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 162, in _lazy_init
interface_1 | _check_driver()
interface_1 | File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 82, in _check_driver
interface_1 | http://www.nvidia.com/Download/index.aspx""")
interface_1 | AssertionError:
interface_1 | Found no NVIDIA driver on your system. Please check that you
interface_1 | have an NVIDIA GPU and installed a driver from
interface_1 | http://www.nvidia.com/Download/index.aspx
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqgif.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqicns.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqico.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqjpeg.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqsvg.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqtga.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqtiff.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqwbmp.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/imageformats/libqwebp.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so"
interface_1 | QLibraryPrivate::unload succeeded on "/usr/local/lib/python3.7/dist-packages/PyQt5/Qt/plugins/platforms/libqxcb.so"
interface_1 | QLibraryPrivate::unload succeeded on "Xcursor" (faked)
styleflow_interface_1 exited with code 1
Any idea what might be going wrong?
Edit: After figuring out how to get a shell inside the docker-compose container, I managed to get the following information:
root@623a41c9882d:/usr/app# python3.7 -c 'import torch; print(torch.version.cuda)'
9.0.176
root@623a41c9882d:/usr/app# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Perhaps I need to figure out how to get the docker built with CUDA 9 instead of 10.
Edit2: After installing updates for nvidia driver files (I guess they came from the nvidia-docker repo), adding nvidia as the default docker-runtime, rebooting to make docker work again and translating the docker-compose.yml into the following docker command, I managed to get the program to show me a black screen.
docker run --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -e QT_X11_NO_MITSHM=1 -e QT_DEBUG_PLUGINS=1 styleflow_interface python3.7 /usr/app/main.py
With docker-compose up the result is still the same as above. I guess I'll see if upgrading docker-composer helps. Version 1.22.0 is probably rather old.
Edit3: after upgrading docker-composer, docker-compose up now gets the same black window up that I got with the above command. I saw these two lines in the log, though:
interface_1 | libGL error: No matching fbConfigs or visuals found
interface_1 | libGL error: failed to load driver: swrast
So I did some googling with the errors and found that running export LIBGL_ALWAYS_INDIRECT=1 before starting main.py gets rid of those errors, but the result is still a black window that does nothing.
Edit4: Since I've solved the original issue that I made this ticket about. I'll leave it here. Hopefully it's helpful for someone. I'll make a new ticket for the black window issue.
Had the same error, changing the docker-compose.yml to this did the trick for me:
version: "3.3"
services:
interface:
build: .
runtime: nvidia
command: python3.7 /usr/app/main.py
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
environment:
DISPLAY: $DISPLAY
QT_X11_NO_MITSHM: 1
QT_DEBUG_PLUGINS: 1
NVIDIA_VISIBLE_DEVICES: all
(I don't have a lot of docker experience, don't kill me if this is wrong)
Had the same error, changing the docker-compose.yml to this did the trick for me:
version: "3.3" services: interface: build: . runtime: nvidia command: python3.7 /usr/app/main.py volumes: - /tmp/.X11-unix:/tmp/.X11-unix environment: DISPLAY: $DISPLAY QT_X11_NO_MITSHM: 1 QT_DEBUG_PLUGINS: 1 NVIDIA_VISIBLE_DEVICES: all(I don't have a lot of docker experience, don't kill me if this is wrong)
Does the container work with CUDA 11 for you? It doesn't for me :/
On ubuntu, old version docker-compose didn't support runtime config, if someone like me using old docker and docker-compose version, then just use docker-compose for build:
sudo docker-compose build
run with docker:
sudo docker run --rm --gpus all \
--env-file .env \
--volume /tmp/.X11-unix:/tmp/.X11-unix \
styleflow_interface:latest \
python3.7 /usr/app/main.py
.env file content is:
DISPLAY=:1
QT_X11_NO_MITSHM=1
QT_DEBUG_PLUGINS=1
NVIDIA_VISIBLE_DEVICES=0
The DISPLAY value ':1' maybe different, get it after command xhost +local:docker by echo $DISPLAY.