LAVIS
LAVIS copied to clipboard
Error in beam-search multinomial sampling
According to the transformer in Huggingface, beam-search multinomial sampling can be implemented by setting num_beams>1 and do_sample=True. However, this is not supported in LAVIS. If I set num_beams=4, num_return_sequences=4 and do_sample=True simultaneously, there is an error as follows:
File "MM/LAVIS/lavis/models/med.py", line 1405, in generate_from_encoder
outputs = self.generate(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "miniconda3/envs/lavis/lib/python3.8/site-packages/transformers/generation_utils.py", line 1404, in generate
return self.beam_sample(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/transformers/generation_utils.py", line 2520, in beam_sample
outputs = self(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 1211, in forward
outputs = self.bert(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 974, in forward
encoder_outputs = self.encoder(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 592, in forward
layer_outputs = layer_module(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 475, in forward
cross_attention_outputs = self.crossattention(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 346, in forward
self_outputs = self.self(
File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "MM/LAVIS/lavis/models/med.py", line 219, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (36) must match the size of tensor b (4) at non-singleton dimension 0
During generation, the size is normal when generating the first token, both query_layer and key_layer are torch.Size([64, 12, 5, 64]). However, when generating the second token, the size of key_layer become torch.Size([4, 12, 577, 64]). So I think there may be something wrong with the image caption. By the way, 5 is my prompt_length and 12 is the attention head.
Could you figure out where the error is? Thanks in advance :)
Hi, @Richar-Du,
I think something fishy might be going on. I will investigate into this. Thanks for raising this.
I have meet the same trouble because my version of transformers is incorrect. Maybe you need to check whether your version of transformers is lower than 4.27