BLIP icon indicating copy to clipboard operation
BLIP copied to clipboard

code on text-video qa

Open cdqncn opened this issue 2 years ago • 5 comments

Dear authors,

I was wondering if you could release the code on text-video qa(e.g., dataloader and how you process the videos).

Thanks!

cdqncn avatar Jun 14 '22 03:06 cdqncn

Hi, you can refer to the code here for dataloading of text-video qa: https://github.com/salesforce/ALPRO. Thanks!

LiJunnan1992 avatar Jun 14 '22 04:06 LiJunnan1992

Thanks for your reply, this code is classification, I just want to learn how to generation for QA in BLIP.

cdqncn avatar Jun 14 '22 06:06 cdqncn

We use the VQA model to generation answers: https://github.com/salesforce/BLIP/blob/48211a1594f1321b00f14c9f7a5b4813144b2fb9/models/blip_vqa.py#L85

To handle videos, we simply concatenate frame features and pass them to the text decoder.

LiJunnan1992 avatar Jun 14 '22 09:06 LiJunnan1992

@cdqncn Hi, have you reproduce the author result in zero-shot video QA. I tried to do it, but failed.

BlueCat7 avatar Aug 03 '22 13:08 BlueCat7

We use the VQA model to generation answers:

https://github.com/salesforce/BLIP/blob/48211a1594f1321b00f14c9f7a5b4813144b2fb9/models/blip_vqa.py#L85

To handle videos, we simply concatenate frame features and pass them to the text decoder.

@LiJunnan1992 By concat, do you mean, after getting the image (frame) encoding, all the encodings are concatenated and then passed that raw concatenated embeddings to the decoder? Thanks for such awesome repo by the way.

dipta007 avatar Dec 09 '23 04:12 dipta007