stylegan2-pytorch icon indicating copy to clipboard operation
stylegan2-pytorch copied to clipboard

torch version

Open qingzi02010 opened this issue 4 years ago • 33 comments

Some errors occurred during compiling the code, can you tell us the version of the torch, and other software environment, such as cuda, cudnn, gcc, ninja, re2c. Thank you !

qingzi02010 avatar Dec 26 '19 01:12 qingzi02010

I have tested it on pytorch1.3 + cuda10, it runs successfully

onion-liu avatar Dec 26 '19 01:12 onion-liu

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See https://github.com/rosinality/stylegan2-pytorch/issues/1)

rosinality avatar Dec 26 '19 03:12 rosinality

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

qingzi02010 avatar Dec 26 '19 06:12 qingzi02010

Could you retry after remove /tmp/torch_extensions directory?

rosinality avatar Dec 26 '19 11:12 rosinality

Sorry, I have no idea to remove /tmp/torch_extensions, and I am not familiar with pytorch-c++ extension. Could you explain more?

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

I suspect it is trying to use cached binaries even after CUDA updates.

rosinality avatar Dec 26 '19 14:12 rosinality

now I have update cuda to 10.2, and add cuda to .bashrc file, but tha same error occurred. So do you have some suggestion? I had better reboot the machine?

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

I don't think you need to reboot after CUDA updates. Could you post full error logs?

rosinality avatar Dec 26 '19 14:12 rosinality

Traceback (most recent call last): File "train.py", line 20, in from model import Generator, Discriminator File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/model.py", line 11, in from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/op/fused_act.py", line 6, in fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu']) File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 661, in load is_python_module) File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 841, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1052, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/opt/anaconda3/lib/python3.7/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/opt/anaconda3/lib/python3.7/imp.py", line 343, in load_dynamic return _load(spec) ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

how about your gcc version? my gcc is 5.4, I am hesitating to update to gcc7.3

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

I'm using gcc 5.4

Did you tried to remove cached binaries in /tmp/torch_extensions? Then could you show me

> ldd /tmp/torch_extensions/fused/fused.so

rosinality avatar Dec 26 '19 14:12 rosinality

ldd /tmp/torch_extensions/fused/fused.so linux-vdso.so.1 => (0x00007ffdeb198000) libcudart.so.10.0 => /usr/local/lib/libcudart.so.10.0 (0x00007f24bc54d000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f24bc1cb000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f24bbfb5000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f24bbbeb000) /lib64/ld-linux-x86-64.so.2 (0x00007f24bca7c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f24bb9e7000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f24bb7ca000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f24bb5c2000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f24bb2b9000)

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

Seems like that there are cases that pytorch couldn't resolve CUDA shared libraries. (https://github.com/NVIDIAGameWorks/kaolin/issues/30) But I don't know how you can resolve it. If you use anaconda, maybe you can try to make new virtual envs and try again after install pytorch 1.3 and cudatoolkit 10.1 on new venvs.

rosinality avatar Dec 26 '19 14:12 rosinality

you are right, after 'rm -rf /tmp/torch_extensions', the error disappeared. Thank you so much. so  this case that pytorch couldn't resolve CUDA shared libraries  may be ignored.

------------------ 原始邮件 ------------------ 发件人: "Kim Seonghyeon"<[email protected]>; 发送时间: 2019年12月26日(星期四) 晚上10:55 收件人: "rosinality/stylegan2-pytorch"<[email protected]>; 抄送: "晴子"<[email protected]>;"Author"<[email protected]>; 主题: Re: [rosinality/stylegan2-pytorch] torch version (#5)

Seems like that there are cases that pytorch couldn't resolve CUDA shared libraries. (NVIDIAGameWorks/kaolin#30) But I don't know how you can resolve it. If you use anaconda, maybe you can try to make new virtual envs and try again after install pytorch 1.3 and cudatoolkit 10.1 on new venvs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

qingzi02010 avatar Dec 26 '19 14:12 qingzi02010

I have the same problem...But I was unable to solve this problem by removing /tmp/torch_extensions. Did you do anything else to solve this problem? @qingzi02010 image

wosecz avatar Jan 06 '20 06:01 wosecz

No, I used the commended version of torch, once operating 'rm -rf /tmp/torch_extensions', "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv" disappeared.

qingzi02010 avatar Jan 06 '20 07:01 qingzi02010

I am using python3.7 of anaconda. I don't know whether there is any relations between the problem and python. You can try. 

------------------ 原始邮件 ------------------ 发件人: "wosecz"<[email protected]>; 发送时间: 2020年1月6日(星期一) 下午2:51 收件人: "rosinality/stylegan2-pytorch"<[email protected]>; 抄送: "晴子"<[email protected]>;"Mention"<[email protected]>; 主题: Re: [rosinality/stylegan2-pytorch] torch version (#5)

I have the same problem...But I was unable to solve this problem by removing /tmp/torch_extensions. Did you do anything else to solve this problem? @qingzi02010

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

qingzi02010 avatar Jan 06 '20 07:01 qingzi02010

Yes this method is correct. I tried several times and fix this problem. (But got another problem......) Thank you for your reply!

wosecz avatar Jan 06 '20 10:01 wosecz

https://www.cnblogs.com/rainsoul/p/12162779.html I do not know what the problem is, you can refer to and try this method.

qingzi02010 avatar Jan 08 '20 03:01 qingzi02010

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

Does anyone else which tensorflow version to use? Because neither tf 1.14 or 1.15 (see original stylegan2 repo) are compatible with CUDA 10.2

kevinstan avatar Jan 09 '20 02:01 kevinstan

@kevinstan I use tf 1.15 on CUDA 10.2. It seems it can run on it.

rosinality avatar Jan 09 '20 23:01 rosinality

something weird happens to me. when I try to train it from screen

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/torch_extensions/fused/fused.so)

I tried removing /tmp/torchextensions but no luck!

sadransh avatar Jan 30 '21 21:01 sadransh

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

I face the same issue, did this resolve, if yes how ? Could you please pass .yml file of conda env ?

Harsha-Musunuri avatar Mar 11 '21 19:03 Harsha-Musunuri

@Harsha-Musunuri could you resolve this issue? I face the same problem, tensorflow1.14 is not compatible with CUDA10.2. Also, pytorch1.3 is not compatible with gcc>5 and CUDA10.2. But, the convert_weight.py code requires gcc>5 and CUDA10.2. Do you have any .yml file of conda env which is compatible with all the versions of required libraries?

denabazazian avatar Apr 14 '21 12:04 denabazazian

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

@rosinality pytorch 1.3 is not compatible with CUDA 10.2, did you install it locally and build PyTorch from source?

denabazazian avatar Apr 14 '21 12:04 denabazazian

@denabazazian I don't remember the environments well. You can use recent version of pytorch.

rosinality avatar Apr 14 '21 13:04 rosinality

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

I face the same issue, did this resolve, if yes how ? Could you please pass .yml file of conda env ?

@denabazazian try this https://drive.google.com/file/d/1EaYl5IP0gBqjagX9mZfXr88l13eUzKay/view?usp=sharing

Harsha-Musunuri avatar Apr 14 '21 15:04 Harsha-Musunuri

I tried the conda env file to no avail. I'm using cuda 10.1 with pytorch 1.7.1. I failed to downgrade this to 1.3.1. I tried other pytorch versions but ran into other problems which when resolved ended back to this state:


CalledProcessError Traceback (most recent call last) ~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix) 1532 stdout_fileno = 1 -> 1533 subprocess.run( 1534 command,

~/miniconda3/envs/dG/lib/python3.9/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs) 527 if check and retcode: --> 528 raise CalledProcessError(retcode, process.args, 529 output=stdout, stderr=stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) ~/Documents/dG/alias-free-gan-pytorch/train.py in 29 get_world_size, 30 ) ---> 31 from stylegan2.op import conv2d_gradfix 32 from stylegan2.non_leaking import augment, AdaptiveAugment 33 from stylegan2.model import Discriminator

~/Documents/dG/alias-free-gan-pytorch/stylegan2/op/init.py in ----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu 2 from .upfirdn2d import upfirdn2d

~/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_act.py in 9 10 module_path = os.path.dirname(file) ---> 11 fused = load( 12 "fused", 13 sources=[

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates) 984 verbose=True) 985 ''' --> 986 return _jit_compile( 987 name, 988 [sources] if isinstance(sources, str) else sources,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates) 1191 clean_ctx=clean_ctx 1192 ) -> 1193 _write_ninja_file_and_build_library( 1194 name=name, 1195 sources=sources,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda) 1295 if verbose: 1296 print('Building extension module {}...'.format(name)) -> 1297 _run_ninja_build( 1298 build_directory, 1299 verbose,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix) 1553 if hasattr(error, 'output') and error.output: # type: ignore 1554 message += ": {}".format(error.output.decode()) # type: ignore -> 1555 raise RuntimeError(message) from e 1556 1557

RuntimeError: Error building extension 'fused': [1/2] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/TH -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/mr/miniconda3/envs/dG/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o FAILED: fused_bias_act_kernel.cuda.o /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/TH -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/mr/miniconda3/envs/dG/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o In file included from /usr/local/cuda-10.1/include/cuda_runtime.h:83, from : /usr/local/cuda-10.1/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported! 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported! | ^~~~~ In file included from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THC.h:4, from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THCAtomics.cuh:5, from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAApplyUtils.cuh:5, from /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu:11: /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THCGeneral.h:11:10: fatal error: cublas_v2.h: No such file or directory 11 | #include <cublas_v2.h> | ^~~~~~~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.

MHRosenberg avatar Aug 22 '21 07:08 MHRosenberg

@MHRosenberg It is not pytorch version problem, but cuda build environment. You can check you can build cuda programs, or use https://github.com/rosinality/alias-free-gan-pytorch/blob/main/Dockerfile.

rosinality avatar Aug 22 '21 14:08 rosinality

Hi, I was working on SAM code and I am getting error in imports: ImportError: /root/.cache/torch_extensions/fused/fused.so: cannot open shared object file: No such file or directory I am getting error after running from models.psp import pSp I am running on deepnote. Could you please help me with this error?

Ameya-Deo avatar Sep 30 '21 15:09 Ameya-Deo