BLIP
BLIP copied to clipboard
The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this
I commented out the line that causes error and uncommented the line below as a temporary solution
Yeah, that's what I did. But it's strange that I can use these two methods in the official demo Colab notebook, but I can't use beam search when I write the code myself.
I commented out the line that causes error and uncommented the line below as a temporary solution
And I succeeded after downgrading the transformers version to 4.16.0. But it seems that I cannot import AutoProcessor when using this version of transformers.
same issue
same issue, waiting for fixing
same issue
I suggest opening an issue with kohya_ss in his repo as he is the one maintaining the behind the scene code. I only wrap his script in a GUI: https://github.com/kohya-ss/sd-scripts
When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this
Dude, I also ran into this problem when I changed sample = False to sample = True like this:
This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used
When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this
Dude, I also ran into this problem when I changed sample = False to sample = True like this:
This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used
I have same problem and using transformers=4.16 did not help
I found that when commented out the line in /model/blip.py line 131 fix the problem:
Don't know why, hope someone can provide the detail explanation down the hood.
I found that when commented out the line in /model/blip.py line 131 fix the problem:
Don't know why, hope someone can provide the detail explanation down the hood.
When you comment out this line, the dimension 9 will be 3, so it can run. But this is not the beam search since you just keep one result!
in order to solve this problem you need to set num_beams=1 not 3. (for instance in blip_vqa.py line 92)
I solved this problem, if the transformers is 4.16.0, everything is ok. But I used the transformers4.36.2, in this case, in the 818 lines of the generation_utils.py of transformers: you need to comment out the _expand_dict_for_generation function where the encoder_hidden_state was multiplied again with the beam search num! Finally solved!!!
You should submit a PR to Koby’s in his sd-scripts repo to fix it for good.
Updating to 1.0.2 fixed it for me.
Commenting out these two lines may work:
https://github.com/salesforce/BLIP/blob/3a29b7410476bf5f2ba0955827390eb6ea1f4f9d/models/blip.py#L131-L132
EDIT: After commenting I noticed yenlianglai had already written.
The recent transformers seems to do repeat_interleave
automatically in _expand_dict_for_generation
. This fix https://github.com/huggingface/transformers/pull/21624 seems to cause this issue.
I used Transformers==4.17 and did not face further issues