stylegan2-ada-pytorch icon indicating copy to clipboard operation
stylegan2-ada-pytorch copied to clipboard

torch==1.9.0+cu111 installation fails and results in training error

Open ennemoser opened this issue 2 years ago • 3 comments
trafficstars

It seems that torch==1.9.0+cu111 torchvision==0.10.0+cu111 can't be installed and resulting in installing torch-2.0.1+cu118. This give an error when I try to train the model.

This is the error I get: ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2) ERROR: No matching distribution found for torch==1.9.0+cu111

The error that I get when training is the following:

/content/drive/My Drive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.0.1+cu117. Falling back to torch.nn.functional.conv2d(). warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.version}. Falling back to torch.nn.functional.conv2d().')

I desperately try to this collab running - Can anyone help?

ennemoser avatar Jun 05 '23 01:06 ennemoser

StyleGAN works with torch 1.7, 1.8, 1.9. These versions are only supported by CUDA 11.1 and Python 3.6, 3.7, 3.8, 3.9. Google Colab uses Python 3.10.11 and CUDA 11.8.0 by default right now after updates.

So, to solve this problem:

  1. I installed CUDA 11.1
  2. Installed Python 3.9.

It helped, but after this, I got new errors which I couldn't solve: warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc()) Setting up PyTorch plugin "upfirdn2d_plugin"... Failed! /content/drive/My Drive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/torch_utils/ops/upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation.

How to install CUDA 11.1 (just do step by step, when you need to reboot, you can just Runtime --> Restart runtime and continue running other commands): https://www.youtube.com/watch?v=5eJTzhGe2QE

How to install python 3.9: !apt-get install python3.9 !curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py !python3.9 get-pip.py Also, I changed all commands in colab that start with python and pip. For example, it was like this: !pip install ninja !python train.py [...] And I changed it to this: !python3.9 -m pip install ninja !python3.9 train.py [...]

monolesan avatar Jun 05 '23 13:06 monolesan

Same issue here. Has anyone found a complete solution yet with the current Colab defaults (or around them)?

d-bohn avatar Jul 17 '23 06:07 d-bohn

I think I fixed this. I did it without changing CUDA or Python version in Colab.

I've added the code changes to this PR: https://github.com/dvschultz/stylegan2-ada-pytorch/pull/48

The fix

I did two main things:

  • Copied some updates from StyleGAN3 into this repo
  • Changed the Colab notebook to use Colab default PyTorch and JAX

StyleGAN3 files

I changed two files in the repo: torch_utils/ops/conv2d_gradfix.py and torch_utils/ops/grid_simple_gradfix.py.

I copied the files from the StyleGAN3 repo, which has received an update to handle new PyTorch versions.

Here's some links to the two SG3 files: https://github.com/NVlabs/stylegan3/blob/main/torch_utils/ops/conv2d_gradfix.py https://github.com/NVlabs/stylegan3/blob/main/torch_utils/ops/grid_sample_gradfix.py

Colab notebook changes

I also changed the Colab notebook. I removed all the JAX and PyTorch uninstall stuff, so that this:

#Uninstall new JAX
!pip uninstall jax jaxlib -y
#GPU frontend
!pip install "jax[cuda11_cudnn805]==0.3.10" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
#CPU frontend
#!pip install jax[cpu]==0.3.10
#Downgrade Pytorch
!pip uninstall torch torchvision -y
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
!pip install timm==0.4.12 ftfy==6.1.1 ninja==1.10.2 opensimplex

becomes this:

!pip install timm==0.4.12 ftfy==6.1.1 ninja==1.10.2 opensimplex

ada-ada-ada-art avatar Aug 09 '23 14:08 ada-ada-ada-art