[Bug]: Interrogate CLIP returning <Error>
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
When trying to run Interrogate CLIP since the 1.6 update, I keep getting
I get a bunch of this spammed in the log though:
*** Error interrogating
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 194, in interrogate
caption = self.generate_caption(pil_image)
File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 181, in generate_caption
caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\blip.py", line 156, in generate
outputs = self.text_decoder.generate(input_ids=input_ids,
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 1611, in generate
return self.beam_search(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 2909, in beam_search
outputs = self(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 886, in forward
outputs = self.bert(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 781, in forward
encoder_outputs = self.encoder(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 445, in forward
layer_outputs = layer_module(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 277, in forward
self_outputs = self.self(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0
---
Steps to reproduce the problem
- Go to img2img
- Drop an image in the image field
- Press Interrogate BLIP
I assume this is a local issue. Interrogate DeepBooru works.
What should have happened?
It should've worked :) Interrogate DeepBooru works.
Sysinfo
What browsers do you use to access the UI ?
Mozilla Firefox
Console logs
https://pastebin.com/5SPMwBHL
It's mostly a bunch of this:
I get a bunch of this spammed in the log though:
*** Error interrogating
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 194, in interrogate
caption = self.generate_caption(pil_image)
File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 181, in generate_caption
caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\blip.py", line 156, in generate
outputs = self.text_decoder.generate(input_ids=input_ids,
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 1611, in generate
return self.beam_search(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 2909, in beam_search
outputs = self(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 886, in forward
outputs = self.bert(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 781, in forward
encoder_outputs = self.encoder(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 445, in forward
layer_outputs = layer_module(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 277, in forward
self_outputs = self.self(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0
---
### Additional information
_No response_
same issue
have you fix it?
Try lowering the values in the Interrogate settings, had the same problem and changing those values worked for me:
Adding some information:
- When using BLIP with num_beams=2, the error shown in the console is: RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0
- Similarly, with BLIP num_beams=3, the error is: RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
- And so on...
The underlying issue appears to be that for num_beams=N:
- The size of tensor 'a' is N, while the size of tensor 'b' is N^2.
- The error indicates that both tensors should have the same size.
- As a result, it only works correctly when num_beams=1.
The error originates from the following line:
File "/home/aiman/AIMan/Repos/stable-diffusion-webui/repositories/BLIP/models/med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
I assume that the tensors 'a' and 'b' in the error are 'query_layer' and 'key_layer.transpose(-1,-2)' respectively.