BLIP The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Jul 07 '23 14:07 Peter-D-James

Screen Shot 2023-07-09 at 9 38 26 AM I commented out the line that causes error and uncommented the line below as a temporary solution

Jul 09 '23 01:07 jding25

I commented out the line that causes error and uncommented the line below as a temporary solution

Yeah, that's what I did. But it's strange that I can use these two methods in the official demo Colab notebook, but I can't use beam search when I write the code myself.

Jul 11 '23 12:07 Peter-D-James

I commented out the line that causes error and uncommented the line below as a temporary solution

And I succeeded after downgrading the transformers version to 4.16.0. But it seems that I cannot import AutoProcessor when using this version of transformers.

Jul 11 '23 12:07 Peter-D-James

same issue

Jul 21 '23 02:07 HWH-2000

same issue， waiting for fixing

Jul 23 '23 15:07 csf0429

same issue

Jul 25 '23 14:07 Saint-lsy

I suggest opening an issue with kohya_ss in his repo as he is the one maintaining the behind the scene code. I only wrap his script in a GUI: https://github.com/kohya-ss/sd-scripts

Jul 28 '23 00:07 bmaltais

When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Dude, I also ran into this problem when I changed sample = False to sample = True like this:

This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used

Sep 18 '23 13:09 shams2023

When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Dude, I also ran into this problem when I changed sample = False to sample = True like this:

This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used

Sep 18 '23 13:09 shams2023

I have same problem and using transformers=4.16 did not help

Nov 20 '23 21:11 hannahgym

I found that when commented out the line in /model/blip.py line 131 fix the problem:

Don't know why, hope someone can provide the detail explanation down the hood.

Dec 03 '23 12:12 yenlianglai

I found that when commented out the line in /model/blip.py line 131 fix the problem:

Don't know why, hope someone can provide the detail explanation down the hood.

When you comment out this line, the dimension 9 will be 3, so it can run. But this is not the beam search since you just keep one result!

Jan 07 '24 15:01 LWShowTime

in order to solve this problem you need to set num_beams=1 not 3. (for instance in blip_vqa.py line 92)

Feb 20 '24 09:02 mandalinadagi

I solved this problem, if the transformers is 4.16.0, everything is ok. But I used the transformers4.36.2, in this case, in the 818 lines of the generation_utils.py of transformers: you need to comment out the _expand_dict_for_generation function where the encoder_hidden_state was multiplied again with the beam search num! Finally solved!!!

Mar 05 '24 02:03 LWShowTime

You should submit a PR to Koby’s in his sd-scripts repo to fix it for good.

Mar 05 '24 21:03 bmaltais

Updating to 1.0.2 fixed it for me.

Mar 21 '24 15:03 amztc34283

Commenting out these two lines may work:

https://github.com/salesforce/BLIP/blob/3a29b7410476bf5f2ba0955827390eb6ea1f4f9d/models/blip.py#L131-L132

EDIT: After commenting I noticed yenlianglai had already written.

The recent transformers seems to do repeat_interleave automatically in _expand_dict_for_generation. This fix https://github.com/huggingface/transformers/pull/21624 seems to cause this issue.

Mar 30 '24 06:03 kohya-ss

I used Transformers==4.17 and did not face further issues

May 23 '24 23:05 Ashh-Z

BLIP BLIP copied to clipboard

The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

BLIP
BLIP copied to clipboard