stylegan2-pytorch
stylegan2-pytorch copied to clipboard
Ask for Software environment
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc ninja: build stopped: subcommand failed.
I spent a day, but I can't train it.
I create a new virtualenv, and install torch 1.3.1, cuda10.2, then it can train. By the way, the version gcc is 4.8.5. Thank you so much!
I have a similar problem. My environment is pytorch1.1.0, cuda 10.0.130, GPU V100, gcc 4.8.4
Updating pytorch from 1.1.0 to 1.3.1 solved this problem. However, another problem occurs.

Could you retry after remove /tmp/torch_extensions/fused?
@rosinality Yes you are right! After trying several times, the problem is solved. Thank you very much!
So the environment require is pytorch >= 1.3.1 cuda >= 10.0 tensorflow >= 1.14
@rosinality hello, I have the same problem "subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1". There is my environment: pytorch 1.3.1 torchvision 0.4.2 tensorflow 1.14 CUDA 10.0 Ubuntu 16.04 gcc 5.4.0 I have spent two days on the problem but I still can not solve it. It really puzzles me. Should I update CUDA to 10.2? Or make some other method? Thanks for your reply!
@yueyang130 Could you post full error message logs? Anyway, I think the prerequisite of pytorch 1.3.1 is CUDA 10.1.
This is my full error. @rosinality

@yueyang130 Isn't this clipped?
@rosinality, there are all my error messages.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
check=True)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 22, in <module>
from model import Generator, Discriminator
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/model.py", line 12, in <module>
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/__init__.py", line 1, in <module>
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_act.py", line 14, in <module>
os.path.join(module_path, 'fused_bias_act_kernel.cu'),
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
with_cuda=with_cuda)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/Device.h:3:0,
from /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.6/dist-packages/torch/include/torch/extension.h:6,
from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act.cpp:1:
/usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/python_headers.h:9:20: fatal error: Python.h: No such file or directory
compilation terminated.
[2/3] /usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu:11:0:
/usr/local/lib/python3.6/dist-packages/torch/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.
@yueyang130 I think python dev files are missing. You can install python-3.x-dev packages if you are using ubuntu.
@rosinality I have installed python3.6-dev and python-dev. However, I still have the problem like this,
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
check=True)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 22, in <module>
from model import Generator, Discriminator
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/model.py", line 12, in <module>
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/__init__.py", line 1, in <module>
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_act.py", line 14, in <module>
os.path.join(module_path, 'fused_bias_act_kernel.cu'),
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
with_cuda=with_cuda)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] /usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /home/lyf/yy_ws/code/stylegan2-pytorch-master/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu:11:0:
/usr/local/lib/python3.6/dist-packages/torch/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.
@yueyang130 You may need to add CUDA header directory path to CPLUS_INCLUDE_PATH. Maybe $CUDA_HOME/include.
@rosinality I sincerely appreciate your help, saving lots of my time. I found some header files in my CUDA are missing for some reasons. I have solved the problem by reinstalled CUDA.
I meet the same problem, my machine enviroment is: pytorch 1.3.1 CUDA 10.1 NVIDIA Driver Version: 430.64 python3.7.6
I follow the instruction above and helps me a lot.
After I install python dev pkg, the weird things occur:
sudo apt-get install python3.7-dev
It says it can't find nvcc, but while I execute /usr/local/cuda-10.0/bin/nvcc -V, it shows:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130`
this is my error log:
Traceback (most recent call last):
File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
check=True)
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "generate.py", line 5, in <module>
from model import Generator
File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/model.py", line 11, in <module>
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/__init__.py", line 1, in <module>
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_act.py", line 14, in <module>
os.path.join(module_path, 'fused_bias_act_kernel.cu'),
File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile
with_cuda=with_cuda)
File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/home/daniel/.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/TH -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
:/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/TH -isystem /home/daniel/.local/lib/python3.7/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /nas/daniel/project/100_face_deaging/IPCGAN/stylegan2model/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: :/usr/local/cuda-10.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.
@danielkaifeng Please check your CUDA installations, and set PATH environment variables to the path where nvcc resides.
I managed to get the environment up and running using Docker (nvidia-docker). The host machine has Nvidia driver 440.44. My Docker file is as follows
FROM nvidia/cuda:10.1-devel-ubuntu18.04
RUN apt-get update && \
apt-get install vim -y && \
apt-get install python3 -y && \
apt-get install python3-pip -y && \
apt-get install git -y
RUN pip3 install --upgrade pip setuptools six
RUN pip3 install torch torchvision \
pandas numpy pillow==6.2.1 opencv-python \
scikit-learn matplotlib seaborn \
jupyterlab tensorflow-gpu==1.15.0 tqdm requests
RUN apt-get install ninja-build
To build the docker file, use the following command
docker build -t rosin_sg <dir_with_only_dockerfile>
To run the docker file, my config is as follows
docker run -it -v <path_to_code>:/root/code -v <path_to_data>:/root/data --gpus all rosin_sg
Once inside the docker container, do a cd. This will take you to the /root/ which is the home directory in the container. One needs to use python3 as opposed to just python to use the interpreter.
The base image is nvidia/cuda:10.1-devel-ubuntu18.04 as the development images expose cuda which allows ninja to work.
The one caveat is that I'm unable to convert the weights as tensorflow does not seem to recognise a GPU in this docker image.
I can run stylegan2 using their given dockerfile. However I was unable to modify the same dockerfile to allow this version of stylegan2 to run. When one runs, the other fails to run.
There is a bad news. If you want to convert weights from any *ffhq.pkl of tensorflow, the requirements of software env are very strict:
- tensorflow 1.14 or 1.15 (to be matched with official tf stylegan2. sorry, you need that really annoying dnnlib/tflib...)
- only cuda 10.0 (not cuda 10.1! cuda 10.1 is never supported by tensorflow 1.14 and 1.15)
I'm not converting any weights, so I can't comment on how that works, but otherwise I found @srirakshith-sai 's Dockerfile to work great (except that he forgot to pip install lmdb). Also, the Dockerfile @rosinality mentioned in a different issue didn't work for me. It seems like it would be great to add an official Dockerfile to the repo
@neoragex2002 I meet the problems when I try to convert weights. The requirements of software env is too strict: tensorflow 1.14 or 1.15, Pytorch 1.4, and cuda 10.0. Have you solved it?
I really would suggest to use Docker for this in combination with nvidia-docker2. Here is my working Dockerfile:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
RUN apt update
RUN apt install -y python3
RUN apt install -y python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow-gpu==1.14.0 scipy==1.3.3 requests==2.22.0 Pillow==6.2.1 h5py==2.9.0 imageio==2.9.0 imageio-ffmpeg==0.4.2 tqdm==4.49.0 torch==1.4.0 torchvision==0.5.0 pandas numpy pillow==6.2.1 opencv-python scikit-learn matplotlib seaborn jupyterlab ninja
With this I managed to run the weight conversion. Haven't tried anything else, yet.
I really would suggest to use
Dockerfor this in combination withnvidia-docker2. Here is my working Dockerfile:FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 RUN apt update RUN apt install -y python3 RUN apt install -y python3-pip RUN pip3 install --upgrade pip RUN pip3 install tensorflow-gpu==1.14.0 scipy==1.3.3 requests==2.22.0 Pillow==6.2.1 h5py==2.9.0 imageio==2.9.0 imageio-ffmpeg==0.4.2 tqdm==4.49.0 torch==1.4.0 torchvision==0.5.0 pandas numpy pillow==6.2.1 opencv-python scikit-learn matplotlib seaborn jupyterlab ninjaWith this I managed to run the weight conversion. Haven't tried anything else, yet.
hi!
i followed your suggestion ,when i run the weight conversion, i got
'Traceback (most recent call last):
File "convert_weight.py", line 236, in
@gzhhhere sorry, it been a while a while and I no longer have that container. But I've found this: https://githubmemory.com/repo/anvoynov/GANLatentDiscovery/issues/31
Perhaps you're also loading the wrong file? or in the wrong directory? pickle files can be sensitive to locations.