GFPGAN icon indicating copy to clipboard operation
GFPGAN copied to clipboard

chore: Run GFPGAN in docker container

Open mmenbawy opened this issue 3 years ago • 8 comments

Why we need it:

  • Maintainers can develop in a container
  • Potential users can re-train GFPGAN in a containerized environment
  • Others can try it in an isolated environment by pulling the image and running a container only

Issue: https://github.com/TencentARC/GFPGAN/issues/102

Remarks for your reviewer:

I used my personal dockerhub account to store the docker image. After approving and before merging GFPGAN project can create a free dockerhub account and use it instead.

mmenbawy avatar Nov 30 '21 19:11 mmenbawy

hey @mmenbawy, I can't verify this MR... The build in your project is failing and I can't run the examples...

  • I re-wrote the Dockerfile to correctly use python3.8
  • Ran into this issue and I can't run it in a MacOS
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1436, in _write_ninja_file_and_build_library
GFPGAN_1  |     _write_ninja_file_to_build_library(
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
GFPGAN_1  |     cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
GFPGAN_1  |     arch_list[-1] += '+PTX'
GFPGAN_1  | IndexError: list index out of range
gfpgan_GFPGAN_1 exited with code 1

marcellodesales avatar Dec 28 '21 09:12 marcellodesales

Running Docker-compose Build  And Original Image

  • Same error while running in a regular machine using your image at mostafaelmenbawy/gfpgan:latest
$ docker run -ti -v $PWD/inputs:/app/inputs -v $PWD/results:/app/results -v $PWD/experiments:/app/exps mostafaelmenbawy/gfpgan:latest python3 inference_gfpgan.py --model_path /app/exps/GFPGANv1.pth --test_path /app/inputs/whole_imgs --save_root /apps/results --arch original --channel 1
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "inference_gfpgan.py", line 7, in <module>
    from basicsr.utils import imwrite
  File "/usr/local/lib/python3.6/dist-packages/basicsr/__init__.py", line 3, in <module>
    from .archs import *
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <module>
    _arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <listcomp>
    _arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/rrdbnet_arch.py", line 6, in <module>
    from .arch_util import default_init_weights, make_layer, pixel_unshuffle
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/arch_util.py", line 13, in <module>
    from basicsr.ops.dcn import ModulatedDeformConvPack, modulated_deform_conv
  File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/__init__.py", line 1, in <module>
    from .deform_conv import (DeformConv, DeformConvPack, ModulatedDeformConv, ModulatedDeformConvPack, deform_conv,
  File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/deform_conv.py", line 19, in <module>
    os.path.join(module_path, 'src', 'deform_conv_cuda_kernel.cu'),
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1445, in _write_ninja_file_and_build_library
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
    cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

marcellodesales avatar Dec 28 '21 10:12 marcellodesales

I fixed the pipeline and the error.

The problem with the error was that the docker image was meant to run on GPUs only that's what I used the BASICSR_JIT=True env variable during build time. Now I removed it during building the image to give the freedom of running the image on CPU or on GPU by adding the flag again during run time as described in the README.md

mmenbawy avatar Dec 30 '21 04:12 mmenbawy

It doesn't build anymore :( Does anyone have a solution by any chance?

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.

andreafalzetti avatar May 07 '22 18:05 andreafalzetti

This should be merged and maintained

benjaminbrumbaugh avatar May 08 '22 01:05 benjaminbrumbaugh

It doesn't build anymore :( Does anyone have a solution by any chance?

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.

Solved with a tip from @mmenbawy:

Try to add the following cmd after the FROM in the Dockerfile

 RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub after the FROM cmd in the Dockerfile

andreafalzetti avatar May 08 '22 22:05 andreafalzetti

This no longer builds. Could someone please update the Dockerfile? Any help would be much appreciated. Thank you

majedazzam avatar Dec 19 '23 18:12 majedazzam