VG-GPLMs `image_len` is not uesd?

`image_len` is not uesd?

Open FutureWithoutEnding opened this issue 2 years ago • 5 comments

https://github.com/HLTCHKUST/VG-GPLMs/blob/ecd40e8d884123666a339ef9d2968b178610b898/src/models/modeling_bart.py#L749

image_len is not uesd in calculate attn?

Nov 08 '22 08:11 FutureWithoutEnding

image_len=None, means the default value is None, you can pass a int list wiht batch size to this function

Nov 16 '22 23:11 TysonYu

I mean is "do not use image_len in calculate attn(as mask)"

Nov 17 '22 02:11 FutureWithoutEnding

And is some error in attn softmax dim? https://github.com/HLTCHKUST/VG-GPLMs/blob/ecd40e8d884123666a339ef9d2968b178610b898/src/models/modeling_bart.py#L882

attn shape is [batch_size(0), text_len(1), image_len(2)], should "softmax" in "image_len dim (2)" So I think "the softmax dim should 2 not 1"?

is there something wrong with my thinking？

Nov 17 '22 02:11 FutureWithoutEnding

I see. The image_len is not used in the multimodal fusion function. You can put this as a mask in the cross-attention. Probably it can improve the performance slightly.

Nov 18 '22 17:11 TysonYu

https://github.com/HLTCHKUST/VG-GPLMs/blob/ecd40e8d884123666a339ef9d2968b178610b898/src/models/modeling_bart.py#L882

I think L882 should be 【reason see up】 attn = F.softmax(attn, dim=2)

am I wrong?

Nov 21 '22 08:11 FutureWithoutEnding

VG-GPLMs VG-GPLMs copied to clipboard

`image_len` is not uesd?

VG-GPLMs
VG-GPLMs copied to clipboard