GFPGAN
GFPGAN copied to clipboard
chore: Run GFPGAN in docker container
Why we need it:
- Maintainers can develop in a container
- Potential users can re-train GFPGAN in a containerized environment
- Others can try it in an isolated environment by pulling the image and running a container only
Issue: https://github.com/TencentARC/GFPGAN/issues/102
Remarks for your reviewer:
I used my personal dockerhub account to store the docker image. After approving and before merging GFPGAN project can create a free dockerhub account and use it instead.
hey @mmenbawy, I can't verify this MR... The build in your project is failing and I can't run the examples...
- I re-wrote the
Dockerfile
to correctly use python3.8 - Ran into this issue and I can't run it in a MacOS
GFPGAN_1 | File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1436, in _write_ninja_file_and_build_library
GFPGAN_1 | _write_ninja_file_to_build_library(
GFPGAN_1 | File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
GFPGAN_1 | cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
GFPGAN_1 | File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
GFPGAN_1 | arch_list[-1] += '+PTX'
GFPGAN_1 | IndexError: list index out of range
gfpgan_GFPGAN_1 exited with code 1
Running Docker-compose Build And Original Image
- Same error while running in a regular machine using your image at
mostafaelmenbawy/gfpgan:latest
$ docker run -ti -v $PWD/inputs:/app/inputs -v $PWD/results:/app/results -v $PWD/experiments:/app/exps mostafaelmenbawy/gfpgan:latest python3 inference_gfpgan.py --model_path /app/exps/GFPGANv1.pth --test_path /app/inputs/whole_imgs --save_root /apps/results --arch original --channel 1
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
File "inference_gfpgan.py", line 7, in <module>
from basicsr.utils import imwrite
File "/usr/local/lib/python3.6/dist-packages/basicsr/__init__.py", line 3, in <module>
from .archs import *
File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <module>
_arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <listcomp>
_arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/rrdbnet_arch.py", line 6, in <module>
from .arch_util import default_init_weights, make_layer, pixel_unshuffle
File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/arch_util.py", line 13, in <module>
from basicsr.ops.dcn import ModulatedDeformConvPack, modulated_deform_conv
File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/__init__.py", line 1, in <module>
from .deform_conv import (DeformConv, DeformConvPack, ModulatedDeformConv, ModulatedDeformConvPack, deform_conv,
File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/deform_conv.py", line 19, in <module>
os.path.join(module_path, 'src', 'deform_conv_cuda_kernel.cu'),
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1136, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
is_standalone=is_standalone)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1445, in _write_ninja_file_and_build_library
is_standalone=is_standalone)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
I fixed the pipeline and the error.
The problem with the error was that the docker image was meant to run on GPUs only that's what I used the BASICSR_JIT=True
env variable during build time. Now I removed it during building the image to give the freedom of running the image on CPU or on GPU by adding the flag again during run time as described in the README.md
It doesn't build anymore :( Does anyone have a solution by any chance?
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
This should be merged and maintained
It doesn't build anymore :( Does anyone have a solution by any chance?
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
Solved with a tip from @mmenbawy:
Try to add the following cmd after the FROM in the Dockerfile
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub after the FROM cmd in the Dockerfile
This no longer builds. Could someone please update the Dockerfile? Any help would be much appreciated. Thank you