VQGAN-CLIP icon indicating copy to clipboard operation
VQGAN-CLIP copied to clipboard

Error when running in CPU mode

Open danielthompson opened this issue 3 years ago • 10 comments

Bug

I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU.

To reproduce

$ python generate.py -p "A painting of an apple in a fruit bowl" -cd cpu

Gives

Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
  File "/home/daniel/repos/vqgan-clip/generate.py", line 633, in <module>
    embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 344, in encode_text
    x = self.transformer(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 199, in forward
    return self.resblocks(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 186, in forward
    x = x + self.attention(self.ln_1(x))
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 183, in attention
    return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1031, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 5082, in multi_head_attention_forward
    attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 4828, in _scaled_dot_product_attention
    attn = softmax(attn, dim=-1)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 1679, in softmax
    ret = input.softmax(dim)
RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'

Expected behavior

No error; generate an output image.

Additional notes

  • I followed the setup described in the readme (kudos - it's very thorough!)
  • Image generation using my GPU works fine, i.e. without the -cd cpu parameter

Environment

Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-88-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.9.0+cu111
[pip3] torch-optimizer==0.1.0
[pip3] torchaudio==0.9.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.10.0+cu111
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] pytorch-lightning         1.4.9                    pypi_0    pypi
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torch-optimizer           0.1.0                    pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchmetrics              0.5.1                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

danielthompson avatar Oct 10 '21 11:10 danielthompson

Same here on Windows, fresh environment

JKeydara avatar Oct 10 '21 16:10 JKeydara

Same problem on Linux Mint. It looks like problem is located in pretrained model types, CLIP, transformers and/or PyTorch. Pretrained models was saved at float16 (see f16 at download script) precision aka "Half" data type and it breaks some PyTorch methods in CPU. I had ran this code but I had put a lot of .to(dtype=torch.float32) in different places (such as CLIP/clip/model.py, torch/nn/functional.py, torch/nn/activation.py, torch/nn/linear.py, torch/nn/conv.py). It is very ugly, unpowerful and terrible solution, please, don't be like me... I will wait for better way...

potassium-chloride avatar Oct 10 '21 23:10 potassium-chloride

i am having the same problem, it worked on some crapy intel cpu but not on my amd one

Munix0 avatar Oct 16 '21 03:10 Munix0

any options for a workaround? my RAM is substantially larger than my GPU memory lol

elliot-holley avatar Oct 19 '21 18:10 elliot-holley

i am having the same problem manjaro i7 3770

Alexandrsv avatar Nov 29 '21 22:11 Alexandrsv

Hi all i have a simple "fix" for those who have this error if they want to use cpu. In ~/VQGAN-CLIP/CLIP/clip/model.py just comment out the function that converts to fp16 so change the function in line 371 to

def convert_weights(model: nn.Module):
    """Convert applicable model parameters to fp16"""

    def _convert_weights_to_fp16(l):
        """
        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
            l.weight.data = l.weight.data.half()
            if l.bias is not None:
                l.bias.data = l.bias.data.half()

        if isinstance(l, nn.MultiheadAttention):
            for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
                tensor = getattr(l, attr)
                if tensor is not None:
                    tensor.data = tensor.data.half()

        for name in ["text_projection", "proj"]:
            if hasattr(l, name):
                attr = getattr(l, name)
                if attr is not None:
                    attr.data = attr.data.half()
        """
        pass

    model.apply(_convert_weights_to_fp16)

then it works. (or simply delete it if you dont plan on using gpu mode later again)

dehaenw avatar Dec 05 '21 13:12 dehaenw

@dehaenw Thank you for your answer, I tried it out, the code is running but produced images are all kind of white noises, did you experience similar behaviour or everything was working fine?

FlavioLeccese92 avatar Dec 30 '21 07:12 FlavioLeccese92

@FlavioLeccese92 It works for me, however it is very slow (~10x slower than an old gpu in my task), otherwise i did not notice any unusual noise except the initialized image before it converges to target

dehaenw avatar Dec 30 '21 23:12 dehaenw

@dehaenw by unusual you mean something like this? White and red samurai fight ukiyo-e 0000 Sorry to bother you and thank you again for your support

FlavioLeccese92 avatar Jan 02 '22 00:01 FlavioLeccese92

@FlavioLeccese92 no, not like your example image, which looks like the colors are somehow clipped or restricted to rgb values of 0 or 255 only per color. my initialized images just look like this: image maybe some imporoper type conversion at some point?

dehaenw avatar Jan 02 '22 17:01 dehaenw