stylegan2-pytorch Ask for Software environment

trafficstars

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc ninja: build stopped: subcommand failed.

I spent a day, but I can't train it.

Jan 04 '20 13:01 JNUChenYiHong

I create a new virtualenv, and install torch 1.3.1, cuda10.2, then it can train. By the way, the version gcc is 4.8.5. Thank you so much!

Jan 04 '20 13:01 JNUChenYiHong

I have a similar problem. My environment is pytorch1.1.0, cuda 10.0.130, GPU V100, gcc 4.8.4

Jan 06 '20 04:01 wosecz

Updating pytorch from 1.1.0 to 1.3.1 solved this problem. However, another problem occurs.

Jan 06 '20 06:01 wosecz

Could you retry after remove /tmp/torch_extensions/fused?

Jan 06 '20 13:01 rosinality

@rosinality Yes you are right! After trying several times, the problem is solved. Thank you very much!

Jan 06 '20 23:01 wosecz

So the environment require is pytorch >= 1.3.1 cuda >= 10.0 tensorflow >= 1.14

Jan 06 '20 23:01 wosecz

@rosinality hello, I have the same problem "subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1". There is my environment: pytorch 1.3.1 torchvision 0.4.2 tensorflow 1.14 CUDA 10.0 Ubuntu 16.04 gcc 5.4.0 I have spent two days on the problem but I still can not solve it. It really puzzles me. Should I update CUDA to 10.2? Or make some other method? Thanks for your reply!

Feb 02 '20 14:02 yueyang130

@yueyang130 Could you post full error message logs? Anyway, I think the prerequisite of pytorch 1.3.1 is CUDA 10.1.

Feb 02 '20 14:02 rosinality

This is my full error. @rosinality

error

Feb 02 '20 14:02 yueyang130

@yueyang130 Isn't this clipped?

Feb 02 '20 15:02 rosinality

@rosinality, there are all my error messages.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
    check=True)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 22, in <module>
    from model import Generator, Discriminator
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/model.py", line 12, in <module>
    from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_act.py", line 14, in <module>
    os.path.join(module_path, 'fused_bias_act_kernel.cu'),
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
    is_python_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
    with_cuda=with_cuda)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
    _build_extension_module(name, build_directory, verbose)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
    raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/Device.h:3:0,
                 from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                 from /usr/local/lib/python3.6/dist-packages/torch/include/torch/extension.h:6,
                 from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp:1:
/usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/python_headers.h:9:20: fatal error: Python.h: No such file or directory
compilation terminated.
[2/3] /usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu:11:0:
/usr/local/lib/python3.6/dist-packages/torch/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.

Feb 02 '20 15:02 yueyang130

@yueyang130 I think python dev files are missing. You can install python-3.x-dev packages if you are using ubuntu.

Feb 02 '20 23:02 rosinality

@rosinality I have installed python3.6-dev and python-dev. However, I still have the problem like this,

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
    check=True)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 22, in <module>
    from model import Generator, Discriminator
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/model.py", line 12, in <module>
    from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_act.py", line 14, in <module>
    os.path.join(module_path, 'fused_bias_act_kernel.cu'),
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
    is_python_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
    with_cuda=with_cuda)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
    _build_extension_module(name, build_directory, verbose)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
    raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] /usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu:11:0:
/usr/local/lib/python3.6/dist-packages/torch/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.

Feb 03 '20 09:02 yueyang130

@yueyang130 You may need to add CUDA header directory path to CPLUS_INCLUDE_PATH. Maybe $CUDA_HOME/include.

Feb 03 '20 10:02 rosinality

@rosinality I sincerely appreciate your help, saving lots of my time. I found some header files in my CUDA are missing for some reasons. I have solved the problem by reinstalled CUDA.

Feb 04 '20 07:02 yueyang130

I meet the same problem, my machine enviroment is: pytorch 1.3.1 CUDA 10.1 NVIDIA Driver Version: 430.64 python3.7.6

I follow the instruction above and helps me a lot. After I install python dev pkg, the weird things occur: sudo apt-get install python3.7-dev

It says it can't find nvcc, but while I execute /usr/local/cuda-10.0/bin/nvcc -V, it shows:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130`

this is my error log:

Traceback (most recent call last):
  File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
    check=True)
  File "/usr/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "generate.py", line 5, in <module>
    from model import Generator
  File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/model.py", line 11, in <module>
    from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
  File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_act.py", line 14, in <module>
    os.path.join(module_path, 'fused_bias_act_kernel.cu'),
  File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 661, in load
    is_python_module)
  File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
    with_cuda=with_cuda)
  File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
    _build_extension_module(name, build_directory, verbose)
  File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
    raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/TH -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o 
:/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/TH -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: :/usr/local/cuda-10.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.

Feb 27 '20 08:02 danielkaifeng

@danielkaifeng Please check your CUDA installations, and set PATH environment variables to the path where nvcc resides.

Feb 27 '20 13:02 rosinality

I managed to get the environment up and running using Docker (nvidia-docker). The host machine has Nvidia driver 440.44. My Docker file is as follows

FROM nvidia/cuda:10.1-devel-ubuntu18.04

RUN apt-get update && \
    apt-get install vim -y && \
    apt-get install python3 -y && \
    apt-get install python3-pip -y && \
    apt-get install git -y

RUN pip3 install --upgrade pip setuptools six

RUN pip3 install torch torchvision \
    pandas numpy pillow==6.2.1 opencv-python \
    scikit-learn matplotlib seaborn \
    jupyterlab tensorflow-gpu==1.15.0 tqdm requests

RUN apt-get install ninja-build

To build the docker file, use the following command

docker build -t rosin_sg <dir_with_only_dockerfile>

To run the docker file, my config is as follows

docker run -it -v <path_to_code>:/root/code -v <path_to_data>:/root/data --gpus all rosin_sg

Once inside the docker container, do a cd. This will take you to the /root/ which is the home directory in the container. One needs to use python3 as opposed to just python to use the interpreter.

The base image is nvidia/cuda:10.1-devel-ubuntu18.04 as the development images expose cuda which allows ninja to work.

The one caveat is that I'm unable to convert the weights as tensorflow does not seem to recognise a GPU in this docker image.

I can run stylegan2 using their given dockerfile. However I was unable to modify the same dockerfile to allow this version of stylegan2 to run. When one runs, the other fails to run.

Mar 12 '20 17:03 srirakshith-sai

There is a bad news. If you want to convert weights from any *ffhq.pkl of tensorflow, the requirements of software env are very strict:

tensorflow 1.14 or 1.15 (to be matched with official tf stylegan2. sorry, you need that really annoying dnnlib/tflib...)
only cuda 10.0 (not cuda 10.1! cuda 10.1 is never supported by tensorflow 1.14 and 1.15)

Mar 27 '20 16:03 neoragex2002

I'm not converting any weights, so I can't comment on how that works, but otherwise I found @srirakshith-sai 's Dockerfile to work great (except that he forgot to pip install lmdb). Also, the Dockerfile @rosinality mentioned in a different issue didn't work for me. It seems like it would be great to add an official Dockerfile to the repo

May 30 '20 14:05 greaber

@neoragex2002 I meet the problems when I try to convert weights. The requirements of software env is too strict: tensorflow 1.14 or 1.15, Pytorch 1.4, and cuda 10.0. Have you solved it?

Aug 21 '20 02:08 songyoyo

I really would suggest to use Docker for this in combination with nvidia-docker2. Here is my working Dockerfile:

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

RUN apt update
RUN apt install -y python3
RUN apt install -y python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow-gpu==1.14.0 scipy==1.3.3 requests==2.22.0 Pillow==6.2.1 h5py==2.9.0 imageio==2.9.0 imageio-ffmpeg==0.4.2 tqdm==4.49.0 torch==1.4.0 torchvision==0.5.0 pandas numpy pillow==6.2.1 opencv-python scikit-learn matplotlib seaborn jupyterlab ninja

With this I managed to run the weight conversion. Haven't tried anything else, yet.

Jan 22 '21 10:01 paulgavrikov

I really would suggest to use Docker for this in combination with nvidia-docker2. Here is my working Dockerfile:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

RUN apt update
RUN apt install -y python3
RUN apt install -y python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow-gpu==1.14.0 scipy==1.3.3 requests==2.22.0 Pillow==6.2.1 h5py==2.9.0 imageio==2.9.0 imageio-ffmpeg==0.4.2 tqdm==4.49.0 torch==1.4.0 torchvision==0.5.0 pandas numpy pillow==6.2.1 opencv-python scikit-learn matplotlib seaborn jupyterlab ninja
With this I managed to run the weight conversion. Haven't tried anything else, yet.

hi! i followed your suggestion ,when i run the weight conversion, i got 'Traceback (most recent call last): File "convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils'' i just want the celebaA weight in pytorch, could you please give more suggestion?QAQ

Nov 19 '21 14:11 gzhhhere

@gzhhhere sorry, it been a while a while and I no longer have that container. But I've found this: https://githubmemory.com/repo/anvoynov/GANLatentDiscovery/issues/31

Perhaps you're also loading the wrong file? or in the wrong directory? pickle files can be sensitive to locations.

Nov 19 '21 21:11 paulgavrikov

stylegan2-pytorch stylegan2-pytorch copied to clipboard

Ask for Software environment

stylegan2-pytorch
stylegan2-pytorch copied to clipboard