VQGAN-CLIP Error when running in CPU mode

Bug

I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU.

To reproduce

$ python generate.py -p "A painting of an apple in a fruit bowl" -cd cpu

Gives

Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
  File "/home/daniel/repos/vqgan-clip/generate.py", line 633, in <module>
    embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 344, in encode_text
    x = self.transformer(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 199, in forward
    return self.resblocks(x)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 186, in forward
    x = x + self.attention(self.ln_1(x))
  File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 183, in attention
    return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1031, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 5082, in multi_head_attention_forward
    attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 4828, in _scaled_dot_product_attention
    attn = softmax(attn, dim=-1)
  File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 1679, in softmax
    ret = input.softmax(dim)
RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'

Expected behavior

No error; generate an output image.

Additional notes

I followed the setup described in the readme (kudos - it's very thorough!)
Image generation using my GPU works fine, i.e. without the -cd cpu parameter

Environment

Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-88-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.9.0+cu111
[pip3] torch-optimizer==0.1.0
[pip3] torchaudio==0.9.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.10.0+cu111
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] pytorch-lightning         1.4.9                    pypi_0    pypi
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torch-optimizer           0.1.0                    pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchmetrics              0.5.1                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

Oct 10 '21 11:10 danielthompson

Same here on Windows, fresh environment

Oct 10 '21 16:10 JKeydara

Same problem on Linux Mint. It looks like problem is located in pretrained model types, CLIP, transformers and/or PyTorch. Pretrained models was saved at float16 (see f16 at download script) precision aka "Half" data type and it breaks some PyTorch methods in CPU. I had ran this code but I had put a lot of .to(dtype=torch.float32) in different places (such as CLIP/clip/model.py, torch/nn/functional.py, torch/nn/activation.py, torch/nn/linear.py, torch/nn/conv.py). It is very ugly, unpowerful and terrible solution, please, don't be like me... I will wait for better way...

Oct 10 '21 23:10 potassium-chloride

i am having the same problem, it worked on some crapy intel cpu but not on my amd one

Oct 16 '21 03:10 Munix0

any options for a workaround? my RAM is substantially larger than my GPU memory lol

Oct 19 '21 18:10 elliot-holley

i am having the same problem manjaro i7 3770

Nov 29 '21 22:11 Alexandrsv

Hi all i have a simple "fix" for those who have this error if they want to use cpu. In ~/VQGAN-CLIP/CLIP/clip/model.py just comment out the function that converts to fp16 so change the function in line 371 to

def convert_weights(model: nn.Module):
    """Convert applicable model parameters to fp16"""

    def _convert_weights_to_fp16(l):
        """
        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
            l.weight.data = l.weight.data.half()
            if l.bias is not None:
                l.bias.data = l.bias.data.half()

        if isinstance(l, nn.MultiheadAttention):
            for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
                tensor = getattr(l, attr)
                if tensor is not None:
                    tensor.data = tensor.data.half()

        for name in ["text_projection", "proj"]:
            if hasattr(l, name):
                attr = getattr(l, name)
                if attr is not None:
                    attr.data = attr.data.half()
        """
        pass

    model.apply(_convert_weights_to_fp16)

then it works. (or simply delete it if you dont plan on using gpu mode later again)

Dec 05 '21 13:12 dehaenw

@dehaenw Thank you for your answer, I tried it out, the code is running but produced images are all kind of white noises, did you experience similar behaviour or everything was working fine?

Dec 30 '21 07:12 FlavioLeccese92

@FlavioLeccese92 It works for me, however it is very slow (~10x slower than an old gpu in my task), otherwise i did not notice any unusual noise except the initialized image before it converges to target

Dec 30 '21 23:12 dehaenw

@dehaenw by unusual you mean something like this? White and red samurai fight ukiyo-e 0000 Sorry to bother you and thank you again for your support

Jan 02 '22 00:01 FlavioLeccese92

@FlavioLeccese92 no, not like your example image, which looks like the colors are somehow clipped or restricted to rgb values of 0 or 255 only per color. my initialized images just look like this: maybe some imporoper type conversion at some point?

Jan 02 '22 17:01 dehaenw

VQGAN-CLIP VQGAN-CLIP copied to clipboard

Error when running in CPU mode

Bug

To reproduce

Expected behavior

Additional notes

Environment

VQGAN-CLIP
VQGAN-CLIP copied to clipboard